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ABSTRACT 

We investigate the ability of current implementations of galaxy group finders to recover 
colour-dependent halo occupation statistics. To test the fidelity of group catalogue inferred 
statistics, we run three different group finders used in the literature over a mock that includes 
galaxy colours in a realistic manner. Overall, the resulting mock group catalogues are remark¬ 
ably similar, and most colour-dependent statistics are recovered with reasonable accuracy. 
However, it is also clear that certain systematic errors arise as a consequence of correlated 
errors in group membership determination, central/satellite designation, and halo mass as¬ 
signment. We introduce a new statistic, the halo transition probability (HTP), which captures 
the combined impact of all these errors. As a rule of thumb, errors tend to equalize the prop¬ 
erties of distinct galaxy populations (i.e. red vs. blue galaxies or centrals vs. satellites), and 
to result in inferred occupation statistics that are more accurate for red galaxies than for blue 
galaxies. A statistic that is particularly poorly recovered from the group catalogues is the red 
fraction of central galaxies as function of halo mass. Group finders do a good job in recov¬ 
ering galactic conformity, but also have a tendency to introduce weak conformity when none 
is present. We conclude that proper inference of colour-dependent statistics from group cata¬ 
logues is best achieved using forward modelling (i.e., running group finders over mock data), 
or by implementing a correction scheme based on the HTP, as long as the latter is not too 
strongly model-dependent. 

Key words: galaxies: clusters: general - galaxies: haloes - galaxies: evolution - galaxies: 
statistics - methods: statistical 


1 INTRODUCTION 

The natural site of galaxy formation in the standard ACDM cos¬ 
mological model of hierarchical structure formation is within grav¬ 
itationally bound dark matter structures, called dark matter haloes. 
Given the hierarchical nature of structure formation, the observed 
halo substructure present in dark matter N-body simulations, and 
the long observed propensity of luminous galaxies to live in groups 
and clusters surrounded by less luminous neighbours, we expect 
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galaxies to group into systems which share a common dark mat¬ 
ter halo, a galaxy group. Galaxy groups suggests a fundamental 
physical scale important for galaxy evolution, namely the extent 
of dark matter haloes, and motivate studying galaxies in groups, 
where group/cluster specific process may influence galaxy evolu¬ 
tion, by use of group finders to construct galaxy group catalogues. 

Spectroscopic redshift surveys are necessary to study galax¬ 
ies in groups, because precise redshift determinations minimize 
challenges associated with projection effects on determining galaxy 
group memberships. Identifying galaxy groups in redshift surveys 
has a long history. Early on, [Turner & Gott|(l976] > identified groups 
in the Catalogue of Galaxies and Clusters of Galaxies followed by 
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IHuchra & Geller| (1982) and |Geller & Huchra| (1983) who used 
the Center for Astrophysics Redshift Survey to search for galaxy 
groups. This was followed by other studies identifying galaxy 


groups in the Las Campanas Redshift Survey ( [Tucker et al.|2000|), 
the Two Degree Field Galaxy Redshift Survey ([Merchan & Zandi-| 


varez 2002, 

Eke et al. 2004 Yang et al. 2005), the Sloan Digital 

Sky Survey ( 

Merchan & Zandivarez 2005 

Miller et al. 2005 [Yang 

et al. 

20071 Berlind et al. 20061 Tago et al. 

2008 2010, Tinker et al. 

2011 

Tempel et al.|2012, Munoz-Cuartas & Muller|2012[ Tempel 

et al. 

2014 

) and the Galaxy Mass and Asst 

imbly survey (Robotham 

et al. 

201 T| 



In particular, the Sloan Digital Sky Survey (SPSS, [York et al.| 
|2000) has been instrumental to the study of galaxy groups by pro¬ 
viding the largest precise redshift space galaxy maps to date, along 
with simultaneous detailed information on galaxy properties. The 
demographics of galaxies provides a challenge for galaxy evolu¬ 
tion models to predict and motivation to understand galaxies’ en¬ 
vironment. A colour magnitude diagram of galaxies shows a bi¬ 
modality in colour (e.g. see |Blanton et al.|2003bj ). This bi-modality 
is observed for galaxies, both in the local universe and out to larger 
redshifts ( [Bell et al.|2004| . The bi-modality in colour is the result 
of a bi-modality of specific star formation rates (sSFR), dividing 
galaxies into a star forming blue cloud and a more quiescent red 
sequence. The persistence of this distribution is well established, 
but the origin of the relation is not well understood. In addition, 
many other galaxy properties are correlated without a clear phys¬ 
ical explanation, e.g. galaxies with late-type morphologies tend to 
be star forming, and vice versa for early-type morphologies, and 
galaxies with increasing stellar mass tend to be quenched. 

It is clear that galaxy properties correlate with environment 
in the local universe. Galaxies in dense environments such as 
groups and clusters display an enhanced quenched fraction and 
an enhanced early-type morphology fraction relative to galaxies 
in isolated environments jDressler|1980[ [Postman & Geller|1984[ 
Balo gh et al.|2004[|Hogg et al.|2004[|Kauffmann et al.|2004[|Blan-| 
ton et al.|2005a|>, and these q ualitative trends persist out to higher 

redshifts ( [Cucciati et al.|2006||Cooper et al.|2007||Peng et al.|2010| ). 

It is not clear to what extent a causal relationship exists between 
environment and galaxy properties and to what extent galaxy prop¬ 
erties are determined by processes specific to galaxies in the centre 
of their own halo. 

Within the context of dark matter halo environments, it has be¬ 
come customary to discuss galaxies in terms of central galaxies and 
satellite galaxies. Central galaxies are those galaxies that occupy 
the centre region of a host halo and are associated with the most 
massive progenitor of that host halo. Satellite galaxies are those 
galaxies that occupy host haloes and are not the central galaxy, gen¬ 
erally less massive than the central and associated with dark matter 
sub-haloes. It is important to distinguish between these two types 
of galaxies as they are subject to different physical processes. A 
satellite galaxy must have transitioned from being a central in its 
own host halo to a satellite by crossing into the virial volume of a 
more massive halo. The transition from the field prevents dark mat¬ 
ter and cold gas from accreting onto the satellite’s subhalo ([Larson] 
|eFaLl[T980t . Satellites falling into this environment can result in 
the thermalization and/or stripping of gas from the satellite galaxy 
([Balogh et al.|2000||Grebel et al.|2003||Kawata & Mulchaey|2008| 

McCarthy et al.|2008| >. Furthermore, satellites are subject to tidal 

forces from the central potential, stripping mass or disrupting the 
subhalo, and gravitational interactions with other satellites ( |Farouki| 
|& Shapiro]| 198 1[ |Moore et al.||1998| ). Meanwhile, central galax¬ 
ies grow from the cannibalization of satellites, and the accretion 


of more dark matter and gas ( [Purcell et al.|2007) . Distinguishing 
between these two populations allows for the study of the relative 
importance of these various processes. 

From studies utilizing galaxy group catalogues it is clear that 
quenched fraction and early-type fraction of centrals and satellites 
increases for increasing host halo mass (Weinmann et aL| |2006| 

| Yang et al.|2008| . There is also evidence that there is a radial de¬ 
pendence within a halo of satellite qu enched fraction ([ Weinmann 
|et al.|2006[ |Wetzel et al.|20T2[|2014[| Watson et al.|2015j >, where 

the quenched fraction falls for increasing radial distance from the 
halo centre. It is vital that these trends be determined accurately to 
provide quantitative constraints on galaxy formation models. Us¬ 
ing constraints from galaxy group catalogues, ! Wetzel et al.|| ( [20T4) 
suggest that satellite galaxies quench with a delay time after in-fall 
into a host halo. This model predicts a population of ejected satel¬ 
lite galaxies (see also [Teyssier et al.|2012| with decreased sSFR, 
galaxies that have passed through the virial volume of a host halo, 
but whose orbits have taken them back out beyond the viral ra¬ 
dius of the host halo. The effect of pre-processing fZabludoff &| 
|Mulc haey 1998]), where satellite galaxies are quenched in a lower 
mass host halo before being accreted by the current host, can be di¬ 
rectly studied with group catalogues. Hou et al] ( [2014| use galaxy 
group catalogues to identify “sub-groups”, measuring an increased 
quenched fraction in sub-groups as evidence that pre-processing is 
important to understand satellite quenching. 

Galaxy group catalogues have been used to measure galaxy 
property correlations beyond those with central/satellite designa¬ 
tion and halo mass. One effect of particular interest here is galactic 
conformity, first noticed by Weinmann et al. ( |2006| ). We define such 
an effect as the tendency of star forming central galaxies to reside 
in groups with star forming satellites, and vice versa for quenched 
centrals and satellites, at fixed halo mass. Subsequent studies have 
detected evidence for conformity in the spirit of this original mea- 


surement ( 

Kauffmann et al. 2013, Phillips et al. 2014a b, Knobel 

|et al. 2014 

i) and at higher redshifts (Hartley et al. 2014). The ro- 


bustness of conformity detections and the physical interpretation 


of conformity is not yet settled. Adding to this picture, [Kauffmann] 
|et al.| ( |20T3] ) see further evidence for a correlation of star formation 
rates between central galaxies and neighbouring galaxies beyond 
the host halo viral radius, a conceptually different effect than pre¬ 
vious measurements of conformity. We make a distinction between 
previous measurements, termed 1-halo conformity, and the |Kauff-| 
mann et al. (2013) large scale phenomenon, termed 2-halo confor¬ 
mity, an effect further elaborated on by |Hearin, Watson, & van den] 
|Bosch| ( |2014b| ) as a signature of assembly bias. 

The primary goal of a group finding algorithm is to partition 
a sample of galaxies into the constituent groups. An ideal group 
finder would result in a perfect mapping between galaxies which 
occupy a common halo and a group within a group catalogue. In 
reality, because any algorithm is limited to work with observations 
in redshift space, it is not possible to perfectly assign galaxies to 
groups. The result is two classes of errors: first, an algorithm may 
include a galaxy (or galaxies) which is not part of a halo popula¬ 
tion into the corresponding group, such galaxies are referred to as 
interlopers. Second, an algorithm may misplace a galaxy into an 
unrelated group. 

There has not been a thorough investigation on the accuracy 
of group catalogues to reproduce unbiased measurements of galaxy 
properties as a function of halo properties. |Tinker et al. 1(2011) note 
that impurities in satellite and central galaxy samples in a group 
catalogue will bias the quenched fractions measured from group 
catalogues,and |Hou et al.| ( [20T4| speculate that group membership 
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contamination may influence the measurements of star formation 
in galaxy groups as a function of environment. [Duarte & Mamon| 
( |2014b| > recently developed a method to take into account the un¬ 
certainties in the group finding process; however, it is unclear if this 
will improve the situation. In this paper, we investigate, in detail, 
the ability of group finders to recover galaxy(-group) properties as 
a function halo properties, and how group catalogue failures skew 
measurements. 

We choose a set of group finding algorithms from the literature 
and describe them along with the mock we use to test them in § [ 2 ] 
We then discuss and quantify the errors group finders make in §[3] 
before we compare the inferred occupation statistics from the group 
catalogues to the true occupation statistics in §0]We conclude with 
a discussion of our results and a summary in §|5]and §[ 6 ] 


2 METHODOLOGY 

The main goal of this paper is to assess how well group finding 
algorithms can recover colour-specific occupation statistics of the 
galaxy population. To that extent, we run three different group find¬ 
ers over a realistic mock galaxy redshift survey, and compare the 
inferred group statistics to the true, underlying trends in the mocks. 
In the following, we describe the construction of our mock redshift 
survey ( g 2 . 1 | ) and the different group finding algorithms (jj | 2 . 2 | l used 
in this study. 


2.1 Mock Catalogues 


In order to assess the accuracy with which group finders can re¬ 
cover colour-dependent group statistics of the galaxy population, 
we need realistic mock catalogues that incorporate galaxy colours. 
We construct such a mock by populating dark matter haloes in a 
large iV-body simulation (dark matter only) with galaxies of differ¬ 
ent luminosities and colours, using both subhalo abundance match- 


ing ([Kravtsov et al.|2004[|Vale & Qstriker|2004[ |Tasitsiomi et al. 


2004 


Conroy et al.|2006 [Shankar et al.|2006, Trujillo-Gomez et al. 


2011 |Rodriguez-Puebla et al. |2012[| Watson et al.|2012| and age¬ 

matching ( |Hearin & Watson 2013|), as described below. These tech¬ 
niques have the advantage that, by construction, the mock pop¬ 
ulation has exactly the same luminosity-distribution and colour- 
distribution as the real data. In addition, as numerous studies have 
shown, abundance matching and age-matching are also extremely 
successful in reproducing various 2 -point statistics (e.g., galaxy 
correlation functions, the excess surface densities inferred from 
galaxy-galaxy lensing) indicating that the galaxies are placed in the 
correct dark matter haloes. Furthermore, age-matching ‘naturally’ 
reproduces galactic conformity, both on small (‘ 1 -halo’) and large 
(‘2-halo’) scales fHearin, Watson, & van den Bosch|2014b] ). 

In what follows, we present a more detailed description of the 
numerical simulation used, and the methods used to populate the 
haloes in the simulation with galaxies. 


2.1.1 Numerical Simulation 

The numerical iV-body simulation used for the construction of our 
mock catalogues is the Bolshoi simulation jKlypin et al. 12011) , 
which follows the evolution of 2048 3 dark matter particles us¬ 
ing the Adaptive Refinement Tree (ART) code ( [Kravtsov, Klypin,] 
|& Kh okhlov 1997) in a flat ACDM cosmology with parameters 
On,o = 1 - U A , 0 = 0.27, U b ,o = 0.0469, n s = 0.95, 


cr8 — 0.82, and h = Ho/(100 kms _ 1 Mpc _1 ) — 0.7 (here¬ 
after ‘Bolshoi cosmology’). The box size of the Bolshoi simu¬ 
lation is Lbox = 250 /i - 1 Mpc, resulting in a particle mass of 
m p = 1.35 x 10 8 h- 1 M©. 

We use the publicly available redshift zero halo catalogu^ 
obtained using the phase-space halo finder ROCKS TAR ( [Behroozi 
|et al.| 2013, 2012 ), which uses adaptive, hierarchical refinement of 
friends-of-friends groups in six phase-space dimensions and one 
time dimension. As demonstrated in Knebe et al. (2011 2013), this 
results in a very robust tracking of (sub-)haloes (see also |van den| 
Bosch & Jiang 2014). Haloes in this catalogue are defined to be 
spherical volumes centred on a local density peak (SO hereafter), 
such that the average density inside the sphere is A v i r = 360 times 
the mean matter density of the simulation box. The radius of each 
such sphere defines the virial radius AW of the halo, which is re¬ 
lated to the mass of the halo via M vir = (4/3)7ri? 3 ir A v i r £2mPcrit, 
where p cr i t = 3H P / SttG is the critical energy density of the Uni¬ 
verse. Additionally, sub-haloes in this catalogue are distinct, self¬ 
bound structures whose centre is found within the virial radius of a 
more massive host halo. For each host and sub-halo, the catalogue 
also lists the maximum circular velocity, Umax = Max[GM(< 
r)/r], where M(< r) is the mass enclosed within a distance r 
of the (sub-)halo center, as well as U pea k, which is defined as the 
halo’s peak value of U ma x over its entire history. 


2.1.2 Populating Haloes with Galaxies 

We populate the host haloes and sub-haloes in the Bolshoi sim¬ 
ulation with galaxies of Petrosian r-band luminosit^ L r , using 
the popular subhalo abundance matching technique, which oper¬ 
ates on the premise that there is tight relation between L r and 
some property of its dark matter halo. The property considered here 
is the peak maximum circular velocity, U pea k. It has been shown 
that abundance matching works best (i.e. it yields results in clos¬ 
est agreement with observations) for this particular halo parameter 
( [Reddick et al.|2013l|Hearin et al.|2013) . 

We start by assigning r-band luminosities to haloes and sub¬ 
haloes using the implicit relation 

^g a l(^ L r ) — Upe a k) 5 (1) 

where n ga i(> L r ) is the number density of observed galaxies with 
r-band luminosities larger than L r , which we compute using the 
SDSS r-band luminosity function of |Blanton et al.| ( p005b| ), and 
7ih(> U pea k) is the number density of dark matter haloes and sub¬ 
haloes with peak circular velocity larger than U pea k, which we ob¬ 
tain from the Bolshoi simulation. Because of the finite mass res¬ 
olution of the simulation, and the magnitude limit of the SDSS 
data, we only assign galaxies to haloes that have r-band magni¬ 
tudes M r — 5 log (h) < —19. This results in populating all haloes 
down to U P e a k ~ 100 kms -1 . 

After this first step, we add stochasticity to the monotonic re¬ 
lationship between L r and U pea k, using the method described in 
Appendix A of Hearin & Watson (2013). Some amount of scatter 
is expected, given the complex interplay of physical processes in 
galaxy formation, and results in mocks that are in better agree¬ 
ment with observ ational data (e.g., |Klypin et al.|| 2011 [ [Trujillo- 
|Gomez et al.|201 l[|Watson et al.|2012| and references therein) than 


1 http://hipacc.ucsc.edu/Bolshoi/MergerTrees.html 

2 Throughout this paper, all luminosities, magnitudes, and colours are k- 
corrected to z = 0.1, using the model described in |Blanton et al.|{2003a} 
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Mhost [M ©/h] 


Figure 1. A random sub-sampling of host halo mass, Mh os t> vs. peak halo 
circular velocity, V pea k, of mock galaxies. The horizontal dashed black line 
is the Vp ea k cut corresponding to M r — 5 log (h) < —19. The vertical 
dashed black line is the cut on halo mass, Mh os t> that results in the equiva¬ 
lent number density of haloes. 

mocks without scatter. Our model for the stochasticity in the lumi¬ 
nosity of mock galaxies results in a uniform scatter in luminosity 
of ~ 0.15 dex at fixed V pe ak. Due to the scatter between M v i r 
and Vpeak, this translates into ~ 0.18 dex of scatter in luminosity 
at fixed M v ir, which is in excellent agreement with observational 
constraints (e.g., |Cooray |2006[ | Yang et al.|2009l|More et al.|2011[ 
|Cacciato et al.|2013| ). The result of this process is the distribution 

of Vpeak — Mhost for central and satellite galaxies shown in Fig. [I] 

Finally, we assign (g — r ) colours to all galaxies in our mock 
catalogue using the age-matching technique introduced by |Hearin| 
|& Watson] ( |2013| ) and |Hearin et al.| ( |2014a| ). Age-matching op¬ 
erates on the premise that, at fixed luminosity, there is a mono¬ 
tonic relation between galaxy colour and some proxy for (sub-)halo 
age. First, all mock galaxies in a narrow bin of r-band luminosity 
are rank-ordered according to this halo age proxy. Next, for each 
galaxy in the bin, a colour is drawn from the observed distribution 
in the SDSS, P(g — r\L r ). These colours are also rank-ordered, 
and subsequently each galaxy in the bin is assigned a colour by 
matching rank-orders, such that the reddest (bluest) colour is as¬ 
signed to the galaxy whose (sub-)halo has the oldest (youngest) 
agf] We refer the reader to the original papers for details regard¬ 
ing the exact definition of halo age used, but roughly it corresponds 
to the time when the main progenitor of the (sub-)halo first reaches 
a mass equal to four percent of its present day mass. We note that 
there is no scatter included in the relation between halo age and as¬ 
signed colour, and we leave an examination of a tunable assembly 
bias mock to a forthcoming paper. 

Once all mock galaxies have been assigned luminosities and 
colours, we split the sample into ‘red’ and ‘blue’ sub-samples, us¬ 
ing a magnitude-dependent cut of |Weinmann et al.] ( |2006j ), which 

3 The resulting mock catalogue is publicly available at http:// 
logrus.uchicago.edu/~aphearin 
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Figure 2. A random sub-sampling of the (g — r) colours vs. r-band absolute 
magnitude of mock galaxies. The black line [Eq.|2] is the colour cut used to 
define the red (red points) and blue (blue points) sub-samples of galaxies. 

roughly follows the observed bi-modality in the colour-magnitude 
relation: 

(g - r) cut = 0.7 - 0.032 [M r - 51og h+ 16.5], (2) 

(see Fig. [2}. In what follows, we refer to galaxies that are redder 
and bluer than (g — r) cu t as ‘red’ and ‘blue’ galaxies, respectively. 
We refer the reader to |Taylor et al.| ( |2015] ) for a thorough discussion 
of definitions of ‘red’ and ‘blue’ galaxies, and the consequences of 
our choice. 

2.1.3 Mock Redshift Survey 

As the final step, we construct volume limited galaxy redshift sur¬ 
veys from the age-matching mock. We place a virtual observer at 
one of the comers of the simulation box, define a (a, S) (right 
ascension, declination) coordinate frame, and compute, for each 
mock galaxy, its angular coordinates, its redshift, z (accounting for 
the peculiar velocity along the line-of-sight towards the virtual ob¬ 
server), and its apparent magnitudes in the r and g bands, m r and 
rrig , respectively. Next, we remove all galaxies with m r > 17.77, 
mimicking the apparent magnitude limit of the SDSS spectroscopic 
survey. From the resulting catalogue we construct a volume limited 
sample for galaxies M r — 5 log h < —19.0 for which the following 
information is available: (a, 5, z, mn r , m g ). 

We do not incorporate observational errors in these observ¬ 
ables, nor do we model the spectroscopic incompleteness of a re¬ 
alistic survey. |Yang et ah] ( |2007| ) find that spectroscopic incom¬ 
pleteness in SDSS has a minimal affect on their results, where it 
is straight-forward to empirically examine the effect. We leave a 
further discussion of this issue to § |5.3| The magnitudes are simply 
redshifted Petrosian magnitudes from our mock. While we are con¬ 
sistent with our use of a magnitude system, it should be noted that 
the use of a different magnitude system may introduce systematic 
changes in our mock and therefore in our results for the group find¬ 
ers run on our mock. We have tested that errors in magnitudes and 
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redshifts similar to those in SDSS have a negligible affect on all of 
our results. 


2.2 Group Finding Algorithms 


In order to gauge the accuracy with which group finders can ex¬ 
tract colour-dependent statistics from galaxy redshift surveys, we 
run three different group finders over the mock galaxy catalogue 
described above. The three group finders used are the friends-of- 
friends (FoF) based group finder o f|Berlind et al. 1(2006), the halo- 
based group finder developed by |Yang et al.|(2005 2007) , and the 
modified version thereof used by [Tinker et al.| ( 2011) . The Yang et 
al. and Tinker et al. group finders assume that dark matter haloes 
are SO, and use the galaxy luminosities in assigning group mem¬ 
berships. They mainly differ in the starting points used to build the 
groups. The Berlind et al. group finder, on the other hand, makes 
no such assumption about halo shape, and ignores all information 
regarding galaxy luminosities when partitioning galaxies among 
groups. These three group finders are representative of group find¬ 
ers in general; most group finders used in the literature use algo¬ 
rithms (or combinations thereof) that are similar to those employed 
in the group finders used here. 

In the following sections we give a brief description of each 
group finder’s method to assign membership to groups, and we 
point the reader to the relevant papers for further details on the 
methods used. Finally, we describe the specific algorithms used for 
all three group finders to estimate halo mass and central/satellite 
designation for individual groups. 


2.2.1 Berlind et al. group finder 


|Berlind et al.| (2006) adopt a simple FoF algorithm to identify 
galaxy groups, which is the most common algorithm used to se- 


lect groups from a 

redshift survey (e.g., Huchra & Gelleh 1982 

Geller & HuchraI1983, Nolthenius & White 1987, Ramella et al. 

1989 1997 1999 

2002, Moore et al. 1993, Tucker et al. 2000, 

Giuricin et al. 2000 

Merchan & Zandivarez]2002, Eke et al. 2004). 


Following [Huchra & Geller| ( [1982| ), a pair of galaxies is linked if 
both their transverse and line-of sight separations are smaller than 
a specified pair of projected and line-of-sight linking lengths, re¬ 
spectively. Formally, two galaxies, i and j, with an angular sepa¬ 
ration, Qij, and observed redshifts, Zi and Zj, have a line-of-sight 
separation given by, 


D \\ ,ij = ttN ~*i\ 

tl 0 

(3) 

and a projected separation given by, 


D±,ij = J^(zi + Zj ) sin (jf)- 

(4) 

The linking condition is, 


D\\,ij ^ b\\fig ! 

(5) 

and 


D±,ij < b±n ~ 1/3 , 

(6) 


where n g is the global, mean number density of galaxies and b± 
and b y are the projected and line-of-sight linking lengths, respec¬ 
tively, in units of the mean inter-galaxy separation. The FoF algo¬ 
rithm is recursive, and links all galaxies that obey the above linking 
condition to each other, thus yielding a unique group of galaxies. 


Note that this group finding algorithm uses only galaxy angular po¬ 
sitions and observed redshifts, and not galaxy luminosities. 

The linking length must be tuned to minimize interlopers 
and maximize completeness of group members. Typically, linking 
lengths that are too small will decrease completeness, while link¬ 
ing lengths that are too large will increase the number of interlopers 
in groups. |Berlind et aT|(2006| used mock catalogues to tune their 
linking lengths, using a number of criteria to judge the quality of 
the resulting group catalogues. They end up using b\\ = 0.75 and 
b _l = 0.14, corresponding to ~ 3 /i _1 Mpc and ~ 0.6 /i _1 Mpc 
for our sample respectively. We adopt these values for which they 
find that groups with N g ^ 10 have: 

(i) an unbiased group multiplicity function. 

(ii) a spurious group fraction less than ~ 1%, where spurious 
groups are fractured ‘pieces’ with no true central galaxy. 

(iii) a halo completeness greater than ~ 97%, which implies 
that fewer than ~ 3% of the groups have more than one true central 
among their members 

(iv) an unbiased projected size distribution as a function of 
group multiplicity 

(v) a velocity dispersion distribution that is ~ 20% too low at 
all group multiplicities. 

Note that the performance is expected to be inferior for groups with 
fewer assigned members. Finally, we stress that this group finder 
was tuned using mocks in which the dark matter haloes are identi¬ 
fied using the FoF algorithm with a linking length of b — 0.2. In 
addition, satellite galaxies were assigned the positions and veloc¬ 
ities of randomly selected dark matter particles within those FoF 
haloes, and therefore do not occupy spherical volumes. This is dif¬ 
ferent from the mocks that we use in this paper, which may have 
implications for our assessment of the performance of the Berlind 
et al. group finder on our mocks (see discussion in § |5.2| ). 

2.2.2 Yang et al. group finder 

Yang et ah] ([2005 2007) developed a halo-based group finder, 
which has the advantage that it is iterative and based on an adap¬ 
tive filter modelled after the expected phase-space properties of 
dark matter haloes. First, potential group centres and initial group 
membership estimates are identified using a FoF algorithm with 
very small linking lengths of b j = 0.3 and b± = 0.05. The 
geometrical, luminosity-weighted centers of all the FOF groups 
thus identified with two members or more are considered as the 
centres of potential groups. All galaxies not linked to any of 
these FOF groups are also treated as tentative centers of potential 
groups. Next, the characteristic group luminosity, L 19 . 5 , is com¬ 
puted, where L 19.5 is the total luminosity of all group members 
with magnitude M r — 5 log h ^ —19.5. The characteristic group 
luminosity is used together with an estimate of the group’s mass-to- 
light ratio, Miso/T 19 . 5 , to obtain an initial estimate of the group’s 
halo mass, Miso- Note that here haloes are defined as SO with 
an average overdensity of 180 times the mean background den¬ 
sity. In the first iteration, it is simply assumed that all groups have 
A/i 8 o/£i 9.5 = 500 h Mq/Lq. For all subsequent iterations, how¬ 
ever, the Miso/T 19.5 relation that derives from the previous iter¬ 
ation is used, as described below. As demonstrated in |Yang et ah] 
(2005) , this iterative technique makes the final group catalogue 
very insensitive to this (arbitrary) initial guess for M 180 /Ti 9 . 5 . 

Assuming that dark matter haloes follow an NFW density pro¬ 
file ( [Navarro, Frenk, & White|[1997] >, and that the distribution of 
galaxies in phase space follows that of the dark matter particles, the 
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number density contrast of galaxies in the redshift space around the 
group centre (assumed to coincide with the centre of the halo) can 
be written as 

Pm (R, A z) = ^ p(Az). (7) 

c p 

Here c is the speed of light, Az = z — z group , P is the average den¬ 
sity of universe, E(i2) is the projected surface density of a (spher¬ 
ical) NFW profile, and p( Az)dAz describes the redshift distribu¬ 
tion of galaxies within the halo and is assumed to have a Gaussian 
form with a velocity dispersion equal to ai8o(l + %ou P )- Here 
also is the one-dimensional velocity dispersion of an isotropic 
NFW halo of mass Miso (see van den Bosch et al. 2004). 

The method re-evaluates group membership by assigning each 
galaxy to any group for which Pm(R , Az) > B, where B is a free 
parameter. If a galaxy can be assigned to more than one group, it 
is assigned to the group which maximizes Pm(R , A z), and if all 
the members of one group can be assigned to another, the groups 
are merged. Next the group centres are recomputed, and the entire 
process is iterated until there is no further change in group mem¬ 
berships. Finally, each group is assigned a new halo mass, M ls0 , 
using abundance matching on L 19 . 5 , and using the |Tinker et al.| 
( |2008| ) halo mass function for the Bolshoi cosmology and for the 
halo definition used here. These newly derived masses are used to 
update the relation between M 180 /T 19.5 and L 19 . 5 , and the entire 
procedure is iterated until the M 180 -T 19.5 relation has converged, 
which typically takes only three to four iterations. 

Using mock galaxy redshift surveys, [Yang et al.] ( |2005) tuned 
the free parameter, B , by maximizing a measure of completeness of 
groups, while minimizing the number of interlopers in groups. The 
best-fit results were obtained for B = 10 for an SDSS main sample 
like survey, which is also the value used for this study. In general 
this will depend on spectroscopic completeness and red-shift error. 
We refer the reader to |Yang et al.| ( |2005] ) for a discussion of these 
effects. 


2.2.3 Tinker et al. group finder 


The galaxy group finder developed by |Tinker et al.| < |2011) is a 
modified version of that of |Yang et al.| ( |2005| described above. It 
starts by estimating initial (sub-)halo masses for each galaxy us¬ 
ing the sub-halo abundance matching method. This method relates 
the galaxy r-band luminosity to the mass of a (sub-)halo by as¬ 
suming a monotonic relation with the brightest galaxies living in 
the most massive (sub-)haloes using the halo mass function from 
Tinker et al. ( 2008) and the sub-halo mass function from Tin ker &| 
Wetzel (2010). With this initial (sub-)halo mass estimate for each 
galaxy, the viral radius and velocity dispersion are determined us¬ 
ing: 


/ 3m 200 \ 1/3 


( 8 ) 


and 


(J v 


GM 200 

2 R 200 


(1 + z), 


(9) 


where M 200 is defined as the mass of a SO halo with an average 
overdensity of 200 times the mean background density, and it^oo 
is the corresponding radius of such a halo. 

Once halo properties have been assigned to each galaxy, the 
probability that the galaxy is a central or satellite galaxy in each 
group is determined in the same way as the Yang et al group finder, 


namely that a galaxy is assigned to be a satellite of a group when 
Pm(R , Az) ^ B. This algorithm is applied to the galaxies with 
initial halo properties determined using abundance matching. Af¬ 
ter the initial group memberships are assigned, halo mass is cal¬ 
culated by abundance matching on total group luminosity and host 
haloes only. This procedure is iterated until group memberships re¬ 
main unchanged. Group membership is sensitive to the choice of 
the constant B , and this parameter must be chosen to best recover 
the desired groups, B — 10. We do not retune this parameter and 
note that the group finder was tuned on a mock with haloes defined 
as SO with a mean internal density of 200 times the mean density 
of the universe. 


2.2.4 halo mass and central/satellite designation 

Once the group finders have been used to assign group member¬ 
ships, halo mass estimates and central/satellite designations are 
made as follows. We assign each group a halo mass, M group , us¬ 
ing abundance matching on total group luminosity, L group , using 
the |Tinker et al.| ( |2008| ) halo mass function for the Bolshoi cos¬ 
mology. Note that L grouP is defined as the summed luminosity of 
all assigned group members, which, by virtue of the volume lim¬ 
ited nature of the mock galaxy redshift survey, are all brighter than 
M r — 5 log h = —19. The particular mass function used is the one 
for which haloes are defined as SO with an average internal density 
200 times the mean background density, M 200 . We use zero scat¬ 
ter in the T g rou P -M gr0 u P relation, rank ordering groups by L group 
and calculating the cumulative number density to assign halo mass 
estimates such that n group (> T group ) = rihaio(> M 2 oo). Finally, 
a galaxy is designated as a central galaxy if it is the brightest group 
member, otherwise it is designated as a satellite. 

While we assign consistent halo mass estimates to groups in 
all three group finders using the algorithm described above, each 
group finder used in this study uses a different halo definition when 
tuning each respective algorithm. We briefly summarize these dif¬ 
ferences as follows and leave a discussion of the consequences to 
a discussion in § |5.2| The Berlind et al. group finder is tuned and 
tested on a mock where haloes are defined using an FoF algorithm 
with a linking length 0.2 times the mean interparticle separation. 
The Yang et al. group finder is tuned on a mock where haloes are 
defined as SO with a mean internal density of 180 times the back¬ 
ground density, while the Tinker et al. group finder is tuned on a 
mock where haloes are defined as SO with a mean internal density 
of 200 times the background density. None of these definitions is 
the same as the one used to populate galaxies in our mock, where 
haloes are defined as SO with a mean internal density of 360 times 
the background density. We do not retune any group finder for this 
study, and instead use each group finder as is. To facilitate compar¬ 
ison, we assign halo mass estimates to groups with a single algo¬ 
rithm, and use the same halo definition, namely M 2 00 , as described 
in the previous paragraph. When we make comparisons between 
group finder results and the mock, we use a M 200 halo mass mea¬ 
surement for haloes in the mock for consistency. 

Finally, our mock and group finders assume that galaxy lumi¬ 
nosity is the primary galaxy property which drives the galaxy oc¬ 
cupation statistics. This is a common assumption, but stellar mass 
could instead have been assumed to be the primary property. In 
line with our mock, for our analysis we use galaxy luminosity to 
estimate halo mass as described previously. Our choice minimizes 
systematic errors associated with uncertainty in what property is 
most applicable to the real universe, and thus our results represent 
a best-case scenario for a group catalogue analysis. In general, dif- 
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fering assumptions between the mock and grouping analysis gives 
rise to systematic errors. We leave a discussion of this to § |5.3[ but 
we note that the systematic uncertainty in statistics dependent on 
halo mass can be substantially impacted by this issue. 


3 GROUP FINDER ERRORS 

In the following section we describe the manner in which group 
finding algorithms can fail (§ |3.1| ), the classic purity and complete¬ 
ness metrics (§ |3.2| l, and finally we introduce a new technique to 
comprehensively characterize group-finder errors: the halo transi¬ 
tion probability (HTP, § |3.3| ). When discussing the errors made by 
group finders in partitioning galaxies into groups, it is often neces¬ 
sary to make a distinction between the groups identified by a group 
finder and the true groups present in the mock. When such a dis¬ 
tinction could be ambiguous, we reserve the term ‘group’ or ‘g’ 
for the result of a group finder and use the term ‘halo’ or ‘h’ to 
refer to the truth. Furthermore we will refer to the three group find¬ 
ers in shorthand for the remainder of the paper as BERLIND-GF, 
YANG-GF, and TINKER-GF for the Berlind et al., Yang et al., and 
Tinker et al. group finders respectively. 


3.1 Generic Failure Modes of Group Finders 

3.1.1 membership allocation errors 

The first challenge of any group finder is to correctly partition 
galaxies into their groups (i.e. to correctly identify group member¬ 
ships). We take a similar approach as |Duarte & Mamon||2014a) ) and 
identify two failure modes associated with this step, which we call 
‘fracturing’ and ‘fusing Q (see Fig. [ 3 }. A group has been fractured 
if galaxy members of a common halo have been placed in two or 
more distinct groups. A group has been fused with another group 
if any members from two distinct haloes have been assigned to one 
and the same group. Note that a single group/halo can experience 
both fracturing and fusing simultaneously, as illustrated in Fig. [3] 
The effect of fusing and fracturing of a group will be key to ex¬ 
plaining the effects of group finding errors. From the outset, one 
should expect fusing to result in a galaxy from a low mass halo be¬ 
ing identified in a higher mass group and vice versa for fracturing. 

3.1.2 central/satellite designation errors 

The second challenge for a group finder is to identify the central 
galaxy of a group. In all three group finders considered here, we 
follow [Yang et al.|(2007| > and |Tinker et al.| ( [2011| ) and identify the 
central galaxy as the brightest group member. Any group member 
that is not a central galaxy is subsequently identified as a satellite. 
Hence, every group will have one, and only one, central galaxy, 
while the number of satellite galaxies can be any positive integer, 
including zero. 

This method for selecting centrals and satellites is motivated 
by the idea that central galaxies grow in mass by cannibalizing 
their satellites |Dubinski|1998[|Cooray & Milosavljevic|2005| and 
by being the repositories of cooling flows, whereas satellite galax¬ 
ies are subjected to a number of processes that quench star for¬ 
mation (e.g., ram-pressure stripping, strangulation) and strip mass 
(e.g. tidal stripping). If this methodology is adopted in galaxy group 

4 [Duarte & Mamon| ( |2014a) use the terms ‘fragmentation’ and ‘merging’ 
instead of ‘fracturing’ and ‘fusing’, respectively. 


halo(es) 


group(s) 




Figure 3. Illustration of group finder failure modes: fracturing (top), fusing 
(middle), and both simultaneously (bottom). Solid circles on the left denote 
the boundaries of haloes and the coloured points within the circle indicate 
galaxies that truly live in that halo. Dashed circles on the right indicate the 
boundaries of group finder identified haloes and the coloured points within 
the dashed circles comprise galaxy groups. The size of the bounding circle 
may be interpreted as halo/group mass, the colour of filled circles as sSFR, 
and the size as an indication of whether it is a central/satellite galaxy in its 
host halo. 


catalogues, the resulting populations of centrals and satellites are 
clearly distinct, in that they have different properties (e.g., colours, 
star formation rates, AGN activity, morphologies) at fixed luminos¬ 
ity or stellar mass (e.g .,|Weinmann et al.|2006||Skibba et al.|2007 
von der Linden et al.||2007| |van den Bosch et al.||2008| |Pasquali| 

et al.|2009| |2010| |Skibba|2009 ; 'Hansen et al.|2009> . In addition, 

It has been shown that brightest group members do not obey ex¬ 
treme value statistics, indicating that they are truly a special class 
among the entire population of galaxies (see |Hearin et al.||2013[ 
Shen et al. 2014 and references therein). Although this indicates 
that identifying the brightest group members as centrals correctly 
identifies centrals from satellites in a statistical sense , it is unlikely 
to be correct in each and every group. Indeed, [van den Bosch et al.| 
( 2005) and Skibba et al. ( 2011), using the phase-space statistics of 
brightest group galaxies, have shown that in a significant fraction of 
dark matter haloes, ranging from ~ 25 percent for Milky Way size 
haloes to ~ 40 percent for cluster size haloes, the brightest group 
member is a satellite rather than the true central. In haloes where 
a satellite is truly brighter than the central, our method to identify 
centrals is guaranteed to fail. 

To characterize this phenomenon in our mock, we define the 
central inversion fraction, the fraction of groups whose brightest 
galaxy is not a central galaxy in the mock. In the limit where group 
membership is determined perfectly, the central inversion fraction 
can be written as: 


. _ ^cenlsat 

/ cen,mv — ^ 5 Gr) 

^groups 

where 7V cen | sat is the number of true satellites identified as centrals. 
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Figure 4. orange: the central inversion fraction for groups as a function of 
halo mass, i.e. the fraction of groups whose brightest member is not a central 
galaxy, green: the satellite inversion fraction for groups, i.e. the fraction of 
satellites that are the brightest member in a group. For groups where the true 
central is not the most luminous member, our criteria to identify centrals 
will result in an error. 


A closely related quantity is the satellite inversion fraction: 

n _ -^sat|cen /ii\ 

Jsat,inv — TV 5 (H) 

fV sa t 

From the intrinsic /inversion in the mock (see Fig.[4]) we can read off 
the minimum error expected in identifying centrals/satellites given 
the designation criterion chosen for this study. If groups were iden¬ 
tified perfectly, in groups with a halo mass of 10 13 M®, roughly 
10 % would have a satellite misidentified as a central. 

Note that any fusing or fracturing in the group finding process 
(as defined above) will generally lead to an even larger number of 
misclassifications. The fusing process results in two or more groups 
being associated with one halo (see middle row in Fig. [3}, while 
fracturing results in one halo being associated with two or more 
groups (see top row in Fig. [3}. Each of these cases, respectively, 
will result in at least one central being identified as a satellite, and 
one satellite being identified as a central. One of the goals of this 
paper is to gauge how the misclassification of centrals and satellites 
in galaxy group catalogues impacts the inferred colour-dependent 
statistics of each population, e.g. central red fraction and galactic 
conformity. 


3.1.3 halo mass estimation errors 

The third and final challenge for galaxy group finders is to esti¬ 
mate the halo mass for each individual group. In principle, this can 
be done using a variety of techniques, such as satellite kinematics, 
gravitational lensing, or X-ray observations. In practice, however, 
each of these methods is only feasible for the most massive groups 
and clusters. Typically, the only available information is group rich¬ 
ness (multiplicity), and the luminosities and line-of-sight velocities 
of the member galaxies. In this paper we assign halo masses to each 
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Figure 5. P (Mhaio | M group ): the distribution of halo masses, Mhaio, that 
galaxies assigned to a group of mass, M group , are drawn from for the 
TINKER-GF (panel a) and in the case of perfect group membership (panel 
b), where group luminosity is used to estimate halo mass. 


identified group by using abundance matching on total group lumi¬ 
nosity (see § |2.2.4| for details), and we use exactly the same method 
for each of the three group finding algorithms used in this paper. 

This method to estimate halo mass for groups is fundamentally 
limited in two ways. First of all, one expects intrinsic scatter in the 
relation between group luminosity, L group , and halo mass, Mhaio, 
which is not accounted for. Secondly, any errors in group mem¬ 
bership determination will generally result in errors in L grouP , and 
thus in the inferred halo mass. Typically, a group that is fractured 
will have a L grouP that is too low, resulting in an underestimate 
of its halo mass, while the opposite applies to fused groups. The 
upper panel of Fig. [5] shows the relation between true halo mass in 
the mock, Mhaio, and the inferred group mass, M group , for individ¬ 
ual galaxies in the TINKER-GF groups. The results for the other 
two group finders are very similar. Note how most galaxies are as¬ 
signed masses that are approximately correct, in that they lie close 
to the 1:1 locus (white line). However, there is also a population of 
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Figure 6. The purity (solid lines) and completeness (dashed line) of centrals (orange) and satellites (green) as a function of group/halo mass for each group 
finder: BERLIND-GF groups (left), TINKER-GF groups (middle), YANG-GF groups (right). 


galaxies (in the upper left and bottom right regions) for which the 
estimated mass is catastrophically wrong. These are galaxies whose 
groups have experienced fusing or fracturing. This is evident from 
the lower panel of Fig. [5] which shows the same results, but in the 
absence of group membership errors, and which therefore reveals 
the scatter in inferred group mass that arises purely from the intrin¬ 
sic scatter in the Mhaio — L g rou P relation. 

In general, the error that arises from the intrinsic scatter in the 
Mhaio — £grou P relation will not have a significant impact on the 
average relation between some group property, Q, and group mass; 
i.e., the inferred ( Q\M group ) will be similar to the true (Q\ Mhaio), 
as long as the scatter in the Mhaio — £ g roup relation is uncorre¬ 
lated with Q. If, on the other hand, such a correlation is present, 
the intrinsic scatter in the Mhaio — L group relation will introduce 
a systematic error in the inferred ( Q\M group ). To see this, con¬ 
sider an example in which Q is positively correlated with L group at 
fixed Mhaio- At that fixed halo mass, groups with a high value of Q 
will have a relatively large L groU p, and their inferred group mass, 
M gr oup, will therefore be systematically overestimated. Similarly, 
groups with low Q will be assigned group masses that are system¬ 
atically biased low. Such systematic biases are difficult to control 
without additional, independent constraints on the halo masses in 
which the group reside, and it is important to be aware of these 
potential shortcomings. 


3.2 Purity and Completeness 

Group finding is an archetypal example of a problem in which one 
attempts to identify a special sub-sample, a distinct galaxy group, 
from a larger population, all other galaxies. For any such problem, 
two natural questions that arise are “how contaminated is the sub¬ 
sample with incorrectly identified members?”, and “how often do 
true members of the sub-sample fail to be included by the selection 
algorithm?” The former question is conventionally quantified by 
the “purity” of the sub-sample, the latter by the “completeness” of 


the sub-sample. In practice, there are various ways in which one 
could apply these two metrics to characterize the errors made by 
group finders. 

At first glance, the most straightforward approach to measure 
purity and completeness would seem to be to calculate each on a 
group-by-group basis. This approach requires uniquely associating 
individual halos in the mock to individual groups in a group cata¬ 
logue so that one may ask a question like “what fraction of galax¬ 
ies in a group truly belong to that group?” (a measure of purity) 
or “what fraction of galaxies in a halo are correctly assigned to a 
group?" (a measure of completeness). Algorithmically, to measure 
completeness requires looking up a halo in the mock and tagging 
the galaxy members, then looking up the group associated with that 
halo in a group catalogue and counting how many of those galax¬ 
ies were tagged as being members of the antecedent halo, relative 
to the number that were not tagged. The measurement of purity is 
done by proceeding in the opposite direction. This endeavour de¬ 
velops into a non-trivial task in the presence of fusing or fracturing, 
where multiple groups and haloes can be associated. 

In light of this, and our focus on the occupation statistics and 
not the properties of individual groups, we abandon a group-by- 
group approach in favour of a different one. Instead, we consider 
the purity and completeness of our group catalogues focusing on 
the sub-samples of centrals and satellites, identification of which 
is one of the primary goals of a group finder. In this case, it is no 
longer necessary to explicitly connect groups to haloes, as every 
galaxy has a unique mapping between central/satellite condition 
in the mock and central/satellite designation in a group catalogue. 
Now, the appropriate question, e.g. for purity, becomes “which 
fraction of satellites (or centrals) in a group catalogue are truly a 
satellite (or central) in the mock?”. While this is a fundamentally 
different measure than the purity and completeness of the group 
memberships, we will show that the two questions are intimately 
related. 

We define the completeness of satellite galaxies identified by 
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a group finder as: 
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and the completeness of central galaxies as: 
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where, for example, iV sat | cen is the number of galaxies identified as 
satellites but which are centrals in the mock. In a complementary 
fashion, we define the satellite purity as: 
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and the central purity as: 
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(15) 


Additionally, we briefly note that the total number of galaxies and 
inferred groups with this formalism is given by: 


Ag a i A cen | cen 4“ ^sat|cen 4“ Asat|sat 4“ -^cen|satj (16) 

Agroups A" C en|cen 4“ Acen|sat* (17) 

We plot the completeness and purity as a function of group mass 
for both centrals and satellites for each group finder in Fig. [6] where 
purity is a function of assigned group mass (purity calculations start 
by looking up galaxies in identified groups) and completeness is a 
function of true halo mass (completeness measurements begin by 
looking up galaxies in haloes). 

We can understand the physical significance of the various 
measures of purity and completeness by considering the effect of 
fusing and fracturing on each measurement. Because centrals out¬ 
number satellites, any fusing is likely to transform a central galaxy 
into a satellite. This can be seen by considering the case of pure 
fusing during the group finding process (middle row in Fig. [3j. 
This process will result in two true centrals being identified in 
one grou{0 By definition, one of those centrals will be classi¬ 
fied as a satellite in the resulting group. Similarly, any fracturing 
is likely to transform a satellite into a central galaxy. It follows 
that fusing generally decreases the purity (completeness) of satel¬ 
lites (centrals) and fracturing generally decreases the purity (com¬ 
pleteness) of centrals (satellites). While it is possible to cook-up 
circumstances where a combination of fusing and fracturing can 
result in no decrease in the purity and completeness measures (two 
groups exchanging centrals or satellites only), these appear to be 
sub-dominant failure pathways. 

No group finder results in perfect purity or completeness of 
the central or satellite sub-samples. This is the first indication that 
indeed some fusing and fracturing unavoidably takes place in the 
group finding process. The YANG-GF and TINKER-GF groups 
show that P sa t and C sa t are generally lower than that of the cen¬ 
trals, becoming approximately equivalent at high mass. This indi¬ 
cates that satellites are generally more difficult to identify. This is 
the expected result for the identification of a sub-dominant popu¬ 
lation, a less extreme example of the perennial problem of locat¬ 
ing a needle in a haystack. On the other hand, we note that the 
BERLIND-GF results in a particularly low satellite purity, with the 
benefit of producing a very high satellite completeness, ~ 90%. 
This is an indication that a significant amount of fusing is taking 
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Figure 7. The satellite transition factor, T sa t, as a function of group/halo 
mass for each group finder. The filled circles are the value for the full sam¬ 
ple. 


place with minimal fracturing in the group finding process. The 
relative equivalence of the P and C measurements in the other two 
indicate that both processes are acting in near equilibrium. 

We note that all group finders show a trend of decreasing P cen 
and Ccen towards increasing group mass. At least one factor in this 
tend is due to the increasing group central inversion fraction for 
high mass haloes (shown in Fig. [4j. This causes group finders to 
misidentify central galaxies, even in correctly identified groups. In 
the limit where group membership is perfectly determined P cen = 
1 — /cen,inv Thus, /cen,inv Sets an Upper limit to Pcen that no group 
finder is likely to exceed. That is, the algorithm to identify central 
galaxies in groups is fundamentally limited, and our measurements 
of Pcen and Ccen are simply reflecting this fact. 

Lastly, we define a closely related quantity to the purity and 
completeness measures, the satellite transition factor, T sat , a mea¬ 
sure of the excess probability that a central galaxy in the mock is 
identified as a satellite galaxy by the group finder over the inverse 
process, as: 

rp _ C^sat A sat | sat + A sat | cen 

i sat = —— = — - —JZ -. (18) 

-tsat -I' sat | sat ~T cen | sat 


If T sat = 1, then for each satellite that is misidentified as a central, 
a central is misidentified as a satellite (A sat | cen = A cen | sat ). If 
Tsat % 1 then the inferred satellite fraction, 
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will deviate from the true mock satellite fraction, 


( 20 ) 


where = T sat / fe ™° ck . If more centrals transition to satellites, 
the inferred satellite fraction will increase, and vice versa. 

We plot the value of T sa t as a function of group mass for 
each group finder in Fig. [7] As expected from the purity and com¬ 
pleteness measurements, the BERLIND-GF has an elevated T sat , 
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increasing for lower mass groups. The YANG-GF also displays a 
slightly elevated T sat , but it remains constant over the mass range 
considered. From each of these observations, we expect the inferred 
satellite fractions to be increased for inferences from those two 
group catalogues (see §J43}. 

The difference in the value of T sat for each group finder has 
its origins in the halo definitions and mocks used to tune the pa¬ 
rameters of each group finder. Group finders tuned on mocks using 
halo definitions that result in larger, more extended haloes relative 
to those in our mock will result in a group finder with a higher T sat , 
and vice versa. We will return to this point in a discussion on the 
role of halo definitions in group finding in § |5.2| 


3.3 The Halo Transition Probability 

In this section we introduce a new statistic to quantify the errors 
made by group finding algorithms that encapsulates the effect of 
all three error modes discussed in this paper, in particular, captur¬ 
ing the correlated nature of the three. The halo transition probability 
(HTP hereafter) measures the probability that a galaxy, with spec¬ 
ified halo properties, transitions to a group, with specified group 
properties. Given that the primary environmental dependence of 
our focus is halo mass, we formulate our new statistic in terms of 
the underlying halo mass and assigned group mass of individual 
galaxies. 

Our formulation of the HTP measures the probability that a 
galaxy assigned to a group of mass M group is truly located in a halo 
of mass Mhaio, i.e. P (Mh a io|Mg r0 up)[^]For example, the HTP can 
answer the question “what is the probability that a galaxy estimated 
to occupy a halo of mass 10 14 M 0 in a group catalogue truly oc¬ 
cupies a 10 12 M© halo?” However, this only gives an estimation 
of mass errors, one of the three primary goals of group finding. In 
addition to halo mass, we are interested in the sub-populations of 
central and satellite galaxies. So, we measure the transition proba¬ 
bilities of central/satellite designations in the group finding process, 
e.g. the probability that a central in the mock in a halo of mass Mh 
is identified as a satellite in the group catalogue given an identified 
group mass, P(sathMh |M g , cen g ). So, the functions that compose 
the HTP are: 


P(cenh, Mh|M g , cen g ) (21) 

P (sath, Mh | M g , cen g ) (22) 

P(sath, Mh|M g , sat g ) (23) 

P(cenh, Mh|M g , sat g ), (24) 

where the normalization is such that: 


J [P(cenh, Mh|M g , cen g ) + P(sath, Mh|M g , cen g )] dMh = 1 

/ [P(sath, Mh|M g , sat g ) + P(cenh, Mh|M g , sat g )] dMh = 1 

(25) 


The HTP can be thought of as a fingerprint of a specific group 
finder, characterizing the mapping between true properties of galax¬ 
ies and the inferred properties in a group catalogue. Each group 
finder will have a different HTP in detail, in as much as algorithms 
differ, and in general, the HTP will be dependent on the underlying 
mock. That is, the response of the group finder will be dependent 


6 This relation can also be inverted: P(Mg roU p|Mh a io) 
P(Mh a l 0 | M groU p )P(Mg roU p )/P(Mh a l 0 ) 


on the underlying model. We will return to this point in a discus¬ 
sion of the prospects for using group catalogues in future studies of 
galaxy group properties in § |5.4| 

To show the morphology of the HTP, we plot an example, 
the YANG-GF run over our mock, in Fig. [8] (The HTP for the 
TINKER-GF and BERLIND-GF are qualitatively similar). Con¬ 
sidering the first quadrant, we describe the meaning of the HTP as 
follows: given that a galaxy is identified as a central in a group of 
mass Mgroup, the probability that it is truly a central and resides in 
a halo of mass Mh a i 0 is given by the value of the HTP at (M g , Mh), 
and similarly for the remaining quadrants. 

The upper panels of the HTP in Fig.[8ji, ii) show that galaxies 
with the correctly identified central/satellite designation display a 
distribution of mock halo masses centred along the 1:1 line, i.e. the 
group finder can recover an unbiased estimation of the halo mass 
for these galaxies. The bottom panels in Fig. [8jiii, iv) show incor¬ 
rectly identified centrals and satellites. Galaxies incorrectly identi¬ 
fied as centrals, i.e. satellites in the mock, cluster in the upper-left 
region of the plot. These galaxies are systematically assigned group 
masses that are below their true host halo mass. This is a telltale 
signature of fracturing. Galaxies which are incorrectly identified 
as satellites, i.e. centrals in the mock, cluster in the lower-right re¬ 
gion of the plot. These galaxies are systematically assigned group 
masses that are above their true host halo mass. This is a telltale 
signature of fusing. Thus the morphology of the HTP is consistent 
with our picture of fusing and fracturing. 

To show the relation of the HTP to the purity and completeness 
measures discussed in the previous section, in the bottom panels of 
Fig. [8] we plot the purity of centrals and satellites (P ce n , P S at) as a 
function of M group , which involves an integration of the HTP. 

Another group finding error phenomenon the HTP illustrates 
clearly is the effect of misidentifying centrals because a satellite 
is the brightest group member. In such a circumstance, one galaxy 
will likely transition from a satellite in the mock to a central in the 
group catalogue without affecting the estimate of group mass. This 
is exactly what is shown by the HTP in the 1:1 line in the lower left 
quadrant(iii) and to a lesser extent in quadrant(iv). 

By examining the HTP, two classes of colour-dependent er¬ 
rors should be expected. The fusing mode in group identification 
is dominated by low mass centrals being assigned to higher mass 
groups and misidentified as a satellite. Low mass central galaxies 
are likely to be blue compared to satellites in higher mass groups. 
Thus, the resulting decrease in satellite purity should be expected 
to lower the inferred red fraction of satellites. Conversely, the frac¬ 
turing error mode is dominated by satellites in higher mass haloes 
being assigned to low mass groups and misidentified as centrals in 
the process. The decrease in purity of the central galaxies is then 
likely to increase the red fraction of centrals. We examine this spe¬ 
cific result in § |4.5| 


4 RESULTS 

In this section we present our principal results pertaining to the 
fidelity with which group finders recover the true, underlying, sta¬ 
tistical trends in our mock catalogues. We begin in § |4.1| with a for¬ 
mal description of our technique for comparing the true statistical 
trends of the underlying galaxy distribution to those trends inferred 
by the group-finder(s). In § |4.2| - § |4.6| we examine a set of occu¬ 
pation statistics: the HOD, satellite fraction, CLF, red fraction, and 
1 -halo conformity. 
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Figure 8. Halo transition probability for the YANG-GF. The upper left quadrant(i) corresponds to central galaxies in the mock which were identified as central 
galaxies by the group finder, P(Mh, cenh|cen g , M g ). The upper right quadrant(ii) corresponds to satellite galaxies in the mock which were identified as 
satellite galaxies by the group finder, P(Mh, sath |sat g , M g ). The lower left quadrant(iii) corresponds to satellite galaxies in the mock which were identified 
as central galaxies by the group finder, P(|Mh a io 5 sath|cen g , M g ). The lower right quadrant(iv) corresponds to central galaxies in the mock which were 
identified as satellite galaxies by the group finder, P(Mh, cenh|sat g , M g ). The x-axis is the estimate of halo mass assigned using the output of the group 
finders and the y-axis is the halo mass from the mock. The orange colour coding shows the frequency of a galaxies placed in 0.2 by 0.2 dex mass bins where 
the frequencies sum to one along the vertical axis at each group mass bin. The lower panels show the purity of the centrals (left) and satellites (right) as a 
function of group mass. 
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4.1 Technique for Comparing Groups to Haloes 

We have identified three general categories of group finding failures 
(see §|3j: 

(i) halo mass estimation errors 

(ii) central/satellite designation errors 

(iii) halo membership assignment errors 

In the following analysis, we use these three types of errors indi¬ 
vidually, or in conjunction, to describe, in detail, how group finding 
algorithms introduce systematic errors in the measurements of var¬ 
ious colour-dependent occupation statistics. 

It is often difficult to disentangle these three error modes to 
identify the primary culprit responsible for the error in the inferred 
statistic. It is particularly difficult to isolate the effect of member¬ 
ship errors, because membership errors generally result in errors in 
mass estimation and central/satellite designation. However, in the 
absence of membership errors, it is easy to isolate the other two 
errors. To assist in this task, we have created three versions of a 
“perfect” group finder, one where the group membership is taken 
directly from the mock (i.e. all galaxies which reside in a common 
halo are exclusively assigned to a common group), but where we 
have applied either (1.) our method of halo mass estimation, (2.) 
the central/satellite designation criteria, or (3.) both simultaneously. 
Otherwise, all galaxy(-group) properties are taken directly from the 
mock. In the first case, we can isolate the effect of our method to 
estimate group masses. In the second case, we can isolate the ef¬ 
fect of our method to identify the central satellite (or conversely 
the effect of a non-zero central inversion fraction in the mock), and 
in the third case we can see the combination of these two effects 
in the absence of membership errors. We show the results of the 
“perfect” group finder(s) to further clarify, when appropriate, which 
error modes are important for a statistic. In some cases it is not 
possible to decouple these error modes, and an appeal to all three 
working in concert must be made to explain the observed error. 

In particular, throughout § [4] we formulate our comparisons 
as follows. First, we choose some particular statistic of the galaxy 
distribution, e.g., (L cen ), and use our mock galaxy catalogue to 
measure the true dependence of this statistic on halo mass, Mhaio. 
Second, we repeat this exercise, but instead use the group catalogue 
to infer the dependence of the same statistic on M groU p. Comparing 
the inferred and true dependence allows one to assess how effec¬ 
tively group catalogues can be used to directly measure the statisti¬ 
cal relationship between galaxy(-group) properties and halo proper¬ 
ties. All error bars are calculated by 50 bootstrapped group samples 
from the resulting group catalogues. 

4.2 The HOD 

We begin our discussion of recovered statistics with the HOD. In 
the HOD formalism, the occupation statistics of galaxies are en¬ 
coded by P(7V ga i | Mhaio) , the probability that a halo of mass Mhaio 
hosts TVgai galaxies satisfying some sample selection criteria, such 
as a brightness and/or a colour cut. In this section, we study how 
effectively one may use group catalogues to directly infer the first 
moments of the HOD, namely (N ga i(Mhiao))- 

The results of the direct measurement of the HOD from the 
group finders (points) and the relation taken directly from the mock 
(solid lines) are shown in Fig. [9] The HOD was measured for three 
samples, the volume limited M r — 5 log h < —19 (top row), 
red sub-sample (middle), and blue sub-sample (bottom). This was 
done for each group finding algorithm: BERLIND-GF groups (left 


column), TINKER-GF groups (middle), and YANG-GF groups 
(right). 

An organizing principle for much of what follows: is that 
group finder errors tend to equalize the properties of distinct galaxy 
sub-populations. Using this intuition, we can explain the specific 
effects seen in the HODs measured directly using the three group 
finders. Focusing initially on the HODs for the full galaxy sample 
(shown in the top row of Fig. [9}, we note two important effects. 
The inferred transition from (iV ga i(M)) = 0 to (iV ga i(M)) = 1 
is too sharp. This is not a failure of the grouping algorithms per se , 
but an unavoidable limitation of our implementation of abundance 
matching to estimate M groU p. This can be seen by examining the 
dashed line, representing the result using a “perfect” group finder 
that uses our implementation of halo mass estimation. As discussed 
in the section on halo mass estimation errors, abundance matching 
with no scatter on total group luminosity will result in a HOD with 
a step function at the low mass end, a feature that is not present in 
our mock. 

Also apparent in the full sample HOD for the BERLIND-GF 
groups is an overestimation of (N ga \\M) at intermediate to high 
masses. This is a consequence of the BERLIND-GF over-fusing 
groups relative to our mock, resulting in a high satellite transi¬ 
tion factor, T sat ~ 1.75. This results in the satellite fraction being 
too high, and an overestimate of the (iV ga i|M). The TINKER-GF 
and YANG-GF have T sat ~ 1 and do not exhibit this same er¬ 
ror, or at least to the same magnitude. We stress that T sat > 1 
for the BERLIND-GF groups is likely a consequence of how the 
group finder was tuned, particularly the mismatch between the FoF 
halo galaxy populations it was tuned to reproduce, and the SO halo 
galaxy populations present in our mock. We leave a more thorough 
discussion of the role of halo definitions in group finding to § |5.2| 

The colour split HODs, red galaxies (middle row) and blue 
galaxies (bottom) in Fig. [9] show further effects that appear to be 
a more general result of all three grouping algorithms. Principally, 
each group finder displays an inability to capture the shape of the 
colour-dependent HOD for the region dominated by central galax¬ 
ies at the low halo mass end, and all group finders predict far too 
many blue galaxies in the region dominated by satellite galaxies in 
higher mass haloes. These two errors originate in two distinct lim¬ 
itations of group finders, namely halo mass estimation errors in the 
region dominated by central galaxies, and membership errors in the 
region dominated by satellites. 

Abundance matching on total group luminosity to estimate 
halo masses for groups with (7V ga i) ~ 1 is not able to capture the 
complex relation between galaxy colour and halo property seen in 
our mock. That is, in the mock there is a correlation between halo 
mass and colour at fixed luminosity, so an estimate based on only 
luminosity misses this correlation. The result of a perfect group 
finder that uses the halo mass estimation algorithm is shown as 
dashed line, perfectly matching the YANG-GF and TINKER-GF 
results at low halo masses. Again, this indicates that using group lu¬ 
minosity to assign halo mass estimates for groups with (iV ga i) ~ 1 
cannot altogether capture complex relations at fixed luminosity (see 
§ |3.1.3| for a more detailed theoretical discussion of this issue). 

The error in the regime where N ga \ > 1 has an all together 
different origin. Membership errors will generally result in impu¬ 
rities in the satellite population. Because most galaxies are central 
galaxies, which are on average bluer than satellite galaxies, at a 
fixed luminosity, fusing of central galaxies into larger groups will 
result in an increased number of blue centrals identified as satellite 
galaxies in groups. Similarly, fracturing of groups is likely to re- 
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Figure 9. The HOD for the group finders run over the mock. The top row is for all galaxies in the mock, the middle for red galaxies, and the bottom for blue 
galaxies. The left column is for theBERLIND-GF, the middle for the TINKER-GF, and the right for the YANG-GF. Dark lines are taken from the mock, and 
points with error bars are the group finder results. The dotted line is the result of a “perfect” group finder, where halo masses are estimated by abundance 
matching on total group luminosity. The error bars are calculated from 50 bootstrapped samples of the group catalogues. 
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Figure 10. The satellite fraction as a function of galaxy luminosity for red, blue, and all galaxies compared to the mock for each group finder: Berlind et al 
groups (left), Tinker et al groups (middle), and Yang et al groups (right). Results for the full sample of galaxies is shown in black, the red sub-sample in red, 
and the blue sub-sample in blue. The group finder results are shown as points with error bars, while the intrinsic mock halo level statistics are shown as solid 
lines. The result of a “perfect” group finder (where group membership is equivalent to that in the mock) is shown as dotted lines. The inferred statistic uses 
only the satellite/central determination from the group finders. 


suit in incompleteness in the red satellite population, resulting in a 
decreased number of red satellites in groups. 

These trends appear in the inferred HOD measured by the 
group finders. All three group finders infer an increased number 
of blue galaxies in high mass haloes as a result of fusing induced 
impurity in the satellite sample. However, the inferred number of 
red galaxies is much closer to the true value for all three group 
finders, with subtle differences. The group finder with the highest 
satellite completeness, BERLIND-GF, infers more red galaxies in 
these haloes than the group finder with the lowest satellite com¬ 
pleteness, TINKER-GF. 


4.3 Satellite Fractions 

The satellite fraction of galaxies, meeting a particular selection cri¬ 
terion, is an important quantity for many studies including galaxy- 
galaxy lensing, the pairwise velocity dispersion of galaxies, clus¬ 
ter finding, and the large scale clustering of galaxies. Group cata¬ 
logues provide a way to directly infer this quantity. Here we mea¬ 
sure the red and blue satellite fraction as a function of galaxy lumi¬ 
nosity, /sat (L | red) and / sa t(£|blue). Note that this statistic does 
not make use of the halo mass estimation of groups. 

In Fig. [lO] we plot the satellite fraction for red and blue galax¬ 
ies from our mock (solid lines) and as measured by the group find¬ 
ers (points). Also, we show, as a dashed line, the result of a “per¬ 
fect” group finder where the satellite/central designation is deter¬ 
mined using the same criteria as the group finders. 

BERLIND-GF groups infer a satellite fraction that is high rel¬ 
ative to the mock as one would expect given the T sat ~ 1.75. The 
excess fusing affects both the red and blue galaxy samples. Here 
we can also see that the YANG-GF groups over-estimate the satel¬ 
lite fraction to a lesser degree, affecting both red and blue galaxies. 


The TINKER-GF groups recover the statistic very accurately at all 
but large luminosities, where satellite inversion becomes important 

At first glance, it is surprising that the group finder inferred 
satellite fraction is not significantly under-estimated for red galax¬ 
ies and over-estimated for blue galaxies. However, because age¬ 
matching results in 2-halo conformity, the neighbouring galaxies 
to any group in the mock are more similar in colour to the group 
members than the average galaxy in the mock. Because of this, 
any membership error will produce less of an error in a colour- 
dependent statistic than if a random galaxy had been merged into a 
group. This is how each group finder can have a significant amount 
of fusing and fracturing while keeping the relative number of red 
to blue satellites approximately equal. This is a specific prediction 
of age-matching, and a mock that does not have any 2-halo confor¬ 
mity will generally result in a larger colour-dependent error than 
the one measured here. We will come back to this point in § |5.1.1| 

4.4 The CLF 

The conditional luminosity function (CLF herafter), 
0 (L|Mhaio)dL, is the average number of galaxies in a lumi¬ 
nosity range, L + dL, that occupy a halo of mass Mhaio- Group 
catalogues, in principle, provide a manner in which the CLF can be 
directly measured (e.g. |Yang et al.|2008] ). For our measurements of 
the CLF, we have made use of all properties of the group finding 
algorithms: membership determination, halo mass estimation, and 
central/satellite designation. Therefore, the CLF measurements are 
the first statistic discussed so far which in principle could suffer 
from any combination of all three general group finding errors: 
membership misidentification, halo mass estimation errors, and 
central/satellite designation errors. 

In Fig. [TT] we show the CLF in three mass bins (columns) 
as measured by the YANG-GF. The full CLF (top row) has been 
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Figure 11. Conditional luminosity functions (CLF) for the YANG-GF run over our mock. The group finder results are shown as points with error bars. The 
intrinsic mock results are shown as lines. First row: the black line and points are for the full halo/group population. Second row: the dashed red line and red 
points with error bars are for red central galaxies only, the blue lines and blue points with error bars are for blue central galaxies only. Bottom row: the red line 
and red points with error bars are for red satellites only. The blue line and blue points with error bars are for blue satellites only. 
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Figure 12. The inferred red fraction of galaxies as a function of galaxy luminosity for centrals and satellites compared to the mock for each group finder: 
Berlind et al groups (left), Tinker et al groups (middle), and Yang et al groups (right). Results for centrals are shown in orange and satellites in green. The 
group finder results are shown as points with error bars, while the intrinsic mock halo level statistics are shown as solid lines. The results of a “perfect” group 
finder (where group membership is equivalent to that in the mock) is shown as dotted lines. The inferred statistic uses only the satellite/central determination 
from the group finders. 


split into a central (middle row) and satellite component (bottom 
row), and further split into red (red points/lines) and blue (blue 
points/lines) sub-samples of the latter two components. Largely, the 
CLF is recovered remarkably well for the non-colour split and the 
red sub-sample at all three halo masses considered here. In detail 
for the errors that are made, we see a similar effect in the CLF mea¬ 
surements as in previously discussed statistics, namely that group 
finder errors tend to equalize the properties of distinct galaxy sub¬ 
populations. In this case, the group finder inferred measurements of 
the central galaxy component of the CLFs are more narrow, and the 
blue and red populations are inferred to be more equal than those in 
the mock. This is a result of designating the brightest group mem¬ 
ber the central galaxy. A group where the central has been misiden- 
tified is likely to be one where the central has a relatively low lumi¬ 
nosity. Thus this algorithm to identify centrals will generally pre¬ 
maturely truncate the distribution of low central luminosities and 
consequently result in a more peaked distribution (the number of 
centrals per group is conserved). The corollary to this is that the 
satellite component falls off more rapidly towards higher luminosi¬ 
ties in the group finder inferred statistic (which we also see). 

The satellite component is most affected by membership de¬ 
termination errors resulting in impurities and incompleteness in the 
blue and red satellite sub-samples. Groups that have undergone fus¬ 
ing receive boosts in the number of blue satellites, and fracturing is 
most likely to result in a red satellite being identified as a central. 
This affects the measurement of the CLF most strongly by overesti¬ 
mating the luminosity function for the sub-dominant blue satellites, 
bringing the measurement closer to that of the red satellites. 

4.5 Red Fractions 

The red fraction of satellites and centrals provides an important 
constraint on galaxy evolution models (e.g. |van den Bosch et ah] 


2008 ; |Peng et al.|[2012[ | Wetzel et al.||2014| ). In principle, group 
finders provide a method to directly measure this statistic. In this 
section we discuss the recoverability of / re d|cen and / re d|sat as a 
function of galaxy luminosity (see Fig. ED and halo/group mass 
(see Fig.[l3]) by group finding algorithms. 

First, we examine the red fraction as a function of galaxy 
luminosity in Fig. [12] The inferred red fraction of satellites and 
centrals tend to converge towards a common value, where, in gen¬ 
eral, the red fraction of satellites, / re d|sat (T ga i), is under-estimated, 
and the red fraction of centrals, f re d| C en(T ga i), is over-estimated. 
As expected from an examination of the CLF and satellite frac¬ 
tions, at high luminosities, the red fraction of satellites is over¬ 
estimated relative to the mock where central inversion becomes 
important. Interestingly, the BERLIND-GF measurements under¬ 
estimate the red fraction of centrals. This is strongly affected by the 
2-halo conformity present in the age-matching mock. Because the 
BERLIND-GF groups suffer a large amount of fusing, a significant 
population of centrals are merged into groups. The centrals that are 
likely to merge into larger groups are those that live in more dense 
environments, and the remaining low luminosity centrals that were 
not merged are more likely to be blue. This phenomena also affects 
the YANG-GF and TINKER-GF groups where the under-estimate 
of the satellite red fraction is significantly reduced and there is al¬ 
most no error in the inferred red fraction of centrals. 

In contrast, the / red | sa t(Mg r0 up) and fred\cen(M group ) statis¬ 
tics (Fig. |T3j display a more interesting behaviour in the devia¬ 
tion from the mock value. The satellite red fraction continues to 
be underestimated by the group finders, consistent with previous 
measurement, but the measured central red fraction deviates signif¬ 
icantly from the true underlying behaviour in the mock. In partic¬ 
ular, the sharp transition between low and high central red fraction 
at ~ 10 13 Mq is completely washed out by the group finding al¬ 
gorithms. This is primarily be a consequence of mass estimation 
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Figure 13. The inferred red fraction of galaxies as a function of group/halo mass for central galaxies and satellite galaxies compared to the mock for each 
group finder: Berlind et al groups (left), Tinker et al groups (middle), and Yang et al groups (right). Results for centrals are shown in orange and satellites 
in green. The group finder results are shown as points with error bars, while the intrinsic mock halo level statistics are shown as solid lines. The results of a 
“perfect” group finder where halo masses and central/satellite designation are determined in the same way as the group finders and where group membership 
is equivalent to that in the mock is shown as dotted lines. The inferred statistic uses the satellite/central determination and halo mass estimation from the group 
finders. 


errors at low and intermediate masses and central identification er¬ 
rors at high masses and not effects from membership determination 
errors-all three group catalogues produce nearly identical measure¬ 
ments of the central red fraction as a function of group mass. This 
may have important implications for the inference of quenching 
mass scales, and we will return to a discussion of the consequences 
of this result in § |5.1.1| 

4.6 1-Halo Conformity 

A primary motivation of this paper is to investigate the ability 
of group catalogues to study the galactic conformity phenom¬ 
ena. 1-halo galactic conformity was originally detected using the 
YANG-GF in |Weinmann et al.| ( |2006| >. Consistent with that detec¬ 
tion, we define 1-halo conformity as the tendency of red central 
galaxies to host redder satellites than blue centrals at a fixed halo 
mass. Galactic conformity serves as a strong test of galaxy evolu¬ 
tion and formation models as it violates the HOD formalism. Age¬ 
matching naturally gives rise to 1-halo conformity, and our mock is 
ideally suited to test the sensitivity of group finders to conformity. 

However, it is also desirable to test the group finders on a 
mock which contains no conformity in order to ascertain if mea¬ 
surements inferred from group finders have a tendency to induce a 
false conformity measurement. For this purpose, we create a shuf¬ 
fled version of the age-matching mock which preserves the HOD, 
but wherein the conformity phenomenon is removed by shuffling 
centrals and satellites between haloes of equal mass, destroying the 
central-satellite colour correlation at fixed halo mass (see Appendix 
0 for details on how this mock was created). 

In Fig. [T4] we plot the blue fraction of satellites in haloes 
(groups) with red central galaxies as red lines (red points) and 
those with blue central galaxies as blue lines (blue points). This 


is done for each group finder in the three columns for both the age¬ 
matching mock, which contains conformity in the top row, and for 
the shuffled mock, which does not contain conformity, in the bot¬ 
tom row. The conformity “signal” in these plots is the separation 
between blue lines (points) and red lines (points). 

Immediately, we conclude that all three group finders are sen¬ 
sitive to the existence of 1-halo conformity in the age-matching 
mock; however, all three group finders also show a smaller, but still 
significant amount of conformity in the shuffled mock, where there 
is no intrinsic conformity. First, we consider the measurements on 
the mock with conformity. The group finders are generally suc¬ 
cessful at recovering this particular measure of 1-halo conformity 
in terms of recovering the magnitude of the /blue separation be¬ 
tween between red and blue centrals; however, the over-all normal¬ 
ization of the measured blue fraction is slightly high. At the lower 
mass end, M <~ 10 12 ' 5 M Q , the blue fraction of galaxies around 
red centrals is significantly over-estimated, eventually resulting in 
the inferred conformity signal going to nearly zero. We attribute 
this trend to a limitation in the halo mass assignment method. To 
demonstrate this, in Fig.[l4]we have plotted the result of a perfect 
group finder as dotted lines where we have used the total group lu¬ 
minosity to estimate group mass. We will return to this mass error 
point when describing the effects seen in the shuffled mock. 

Now, we turn our attention to the measurement on the shuf¬ 
fled mock. All three group finders show an elevated blue satel¬ 
lite fraction for all masses compared to the mock. In addition, all 
three group finders show an increased blue satellite fraction around 
blue central galaxies at fixed inferred group mass for Mg roU p 
10 14 M 0 relative to that inferred around red centrals (induced con¬ 
formity). The first effect, the overall elevated blue satellite fraction, 
is a repercussion of the removal of 2-halo conformity in the shuffled 
mock. It is the presence of 2-halo conformity in the age-matching 
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Figure 14. 1-halo galactic conformity measured in the age-matching mock (top) and the shuffled mock(bottom) for the BERLIND-GF groups (left), 
TINKER-GF groups (middle), and YANG-GF groups (right). The blue fraction of satellites is plotted against halo/group mass for groups/haloes with a 
red central galaxy (red) and blue central galaxy (blue). The intrinsic mock halo statistics are shown as lines, and the group finder results are shown as points 
with error bars. The dotted lines are the result of a “perfect" group finder. The conformity “signal” is the separation between blue lines (points) and red lines 
(points). No separation implies no 1-halo conformity, and conversely, separation implies 1-halo conformity. 


mock which reduces the effect of fusing and fracturing on the in¬ 
ferred red and blue satellite fractions discussed in previous sections. 
Without this mitigating effect, the colour errors are generally larger. 

There are two effects which contribute to the measured con¬ 
formity signal in the shuffled mock. First, because the red fraction 
of central and satellite galaxies increases with halo mass, a false 
1 -halo conformity measurement can be induced when groups are 
subject to either fracturing or fusing (any membership failure will 
result in one or both of these failure modes being present). Con¬ 
sider a group that has been purely fractured-its parts will systemat¬ 
ically be assigned a halo mass below its members’ true halo mass, 
resulting in both the assigned central and satellite populations be¬ 
ing redder than correctly identified groups of the same assigned 
halo mass. Conversely, purely merged groups’ members will be as¬ 
signed a halo mass that is larger than its members’ true halo mass, 
resulting in a group with bluer centrals and satellites relative to cor¬ 
rectly identified groups of the same halo mass (examine the galaxy 
colours in Fig. [3j. From this, we conclude that any membership 
errors generally result in an induced or elevated conformity signal. 

Second, the relative over (under-)estimation of the satellite 
blue fraction could have been expected around blue (red) cen¬ 


trals for our mocks given the relation between galaxy colour and 
halo mass at fixed luminosity. In Fig. [15] we show the halo mass- 
central luminosity relation for red and blue centrals in the age¬ 
matching mock (which is equivalent in the shuffled version). This 
phenomenon occurs in our luminosity based mock because at fixed 
Vpeak there is a non-random correlation between halo age and 
mass. Estimating group mass using only luminosity results in a 
colour dependent mass error because the scatter in the Mhaio — 
£grou P relation is correlated with galaxy colour (see the last para¬ 
graph of § |3.1.3| for a more theoretical discussion). Specifically, in 
haloes more massive than 10 12 ' 5 M Q , where the satellite fraction 
begins to become significant for our sample, at a fixed central lu¬ 
minosity, blue centrals occupy lower mass haloes than red centrals. 
At a given assigned Mg roup , groups with blue centrals are likely 
to truly occupy lower mass haloes than groups with red centrals, 
and because more massive haloes have redder satellite populations 
than lower mass haloes, this systematic mass estimation error re¬ 
sults in an induced conformity signal. We show the magnitude of 
this error in the conformity signal measured using a perfect group 
finder as dotted lines in Fig. [14] This is the same effect, but in the 
opposite direction, which is responsible for the overestimation of 
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Figure 15. mock halo mass vs. mock central luminosity for red centrals 
(red points and markers with error bars indicating the scatter) and for blue 
centrals (blue points and markers with error bars indicating the scatter). At 
fixed central luminosity, blue centrals occupy lower mass haloes than red 
centrals for haloes more massive than ~ 10 12 Mq (and vice versa for lower 
mass haloes). In the limit where group luminosity is determined by central 
luminosity, at equal assigned group mass, groups with a blue central will 
truly be associated, on average, with a lower mass halo than groups with a 
red central. 

the satellite blue fraction around red centrals in the age-matching 
mock at low masses (top panel). Put another way, correlation be¬ 
tween galaxy colour and halo mass at fixed luminosity can induce 
a conformity signal or alter the magnitude of the signal. 


5 DISCUSSION 

5.1 Implications for Group Catalogue Studies 

5.1.1 Central and Satellite Quenching 

It is known from observations that quenched galaxies are more 
likely to have larger stellar masses and live in higher galaxy over¬ 
densities than star forming galaxies. Developing a model which re¬ 
produces the quenched fraction of galaxies as a function of envi¬ 
ronment and luminosity/stellar mass remains a challenge for galaxy 
formation models. Galaxy group catalogues provide a valuable re¬ 
source to study this phenomena, allowing one to isolate the effect 
of environment and distinguish between central and satellite popu¬ 
lations 

As we have seen, galaxy group catalogues tend to equalize the 
properties of distinct galaxy populations, with galactic conformity 
being a subtle exception. This becomes an important bias to un¬ 
derstand when comparing the occupation dependence on halo mass 
of star forming and quenched galaxies. We have shown that group 
finders tend to decrease the inferred red fraction of satellites, while 
increasing it for central galaxies. Furthermore, by examining this 
statistic using various shufflings of our mock (e.g. the ‘shuffled’ 
mock described in Appendix [A]) we see that the magnitude of the 
error is dependent on the underlying model. 


The quenched fraction of satellites and centrals (or number) as 


a function of halo mass has been measured |Yang et al. 2008 

2009 

Weinmann et al. 2006 2009, Wetzel et al. 2012, Peng et al. 

2012 

Knobel et al. 2014]) using group catalogues without accounting for 


systematic errors associated with making these measurements. In¬ 
stead group finders should be forward modelled, by running the 
group finder over galaxy mocks, to make fair comparisons to ob¬ 


servations (e. g.|Skibba et al.||2011| |Wetzel et al.||2013| |Hearin &| 

|Watson|2013| | Watson et al.|2015| f We see that for this particular 

statistic, the dominate source of systematic error in our inference 
comes from correlated scatter in galaxy colour at fixed central lu¬ 
minosity and central inversion in our mock. We leave a thorough 
study of the inference of central quenching statistics to a forthcom¬ 
ing paper. 

Even if the error for a relevant measure in our study is small, it 
is worth noting that this is significantly influenced by 2-halo confor¬ 
mity. We believe 2-halo conformity is a realistic phenomenon to in¬ 
clude, and that for the purposes of this study is accurately included 
in our mock, but, it may be a poor crutch to rely upon to correct 
for group finding errors, especially for galaxy properties other than 
colour or sSFR. We can estimate the importance of this effect for 
the satellite red fraction shown in § |4.5| as follows-if group mem¬ 
bership errors are random (galaxies that are mislabelled as satellites 
are drawn from the total population of centrals), in terms of purity 
and completeness (see § |3.2| ), we can write the expected group cat¬ 
alogue inferred red satellite fraction for the entire sample as, 

ninf _r^-y j*mock a 7-mock 

/sat|red L^sat/redlsaG ''sat 

+ (1 - G cen )/ r i : d 7 c k en v c ”° ck ]/(/ r ediv ga i), 

where, for example, /™d|sat * s the fraction of satellites in the mock 
which are red. However, using this calculation, one would expect 
the TINKER-GF groups inferred red (blue) satellite fraction to 
be under-estimated (over-estimated) by ~ 11% (29%); however, 
empirically, it is under-estimated (over-estimated) by only ~ 4% 
(12%) for the full sample. For a mock with weaker 2-halo con¬ 
formity, the colour-dependent satellite fractions will have a larger 
error. 


5.1.2 Galactic Conformity 

A primary motivation for our study of group finding errors was to 
investigate the ability of galaxy group catalogues to study the galac¬ 
tic conformity phenomenon. In § |4.6[ we measured the blue fraction 
of satellites around blue and red centrals as a function of halo mass. 
This is consistent with the original measurement of conformity by 
Weinmann et al. (2006), who find that at fixed halo mass, groups 
with a passive central galaxy have, on average, a more passive satel¬ 
lite population than groups with a star forming central galaxy. We 
find that for our mock, one with strong conformity of the magni¬ 
tude measured by Wein mann et al.| ( |2006| ), group finders largely 
recover the 1-halo conformity signal. However, we also found that 
group finders tend to induce a small conformity signal in a mock 
constructed to have no conformity. The primary difficulties in mea¬ 
suring 1-halo conformity are twofold: accounting for systematic 
halo mass differences in groups with red and blue centrals, and 
the effect of membership errors in groups that results from fusing 
and fracturing. As we have seen, both of these can induce a false 
conformity-like signal. 

A subsequent measurement of 1-halo conformity using the 
Yang et al group catalogue was made by |Knobel et al.| ( |2014| >, 
who further control for additional environmental variables beyond 











































Colour-dependent Occupation Statistics 21 


halo mass, finding a result in qualitative agreement with [WeirT| 
mann et al. (2006), namely that satellites around quenched centrals 
display an increased quenching efficiency parameter compared to 
those around star forming centrals. |Knobel et ah| ( |2014| ) attribute the 
persistence of the galactic conformity signal to “hidden variables” 
not controlled for in the sample selection when comparing satellite 
samples, including uncertainties in the estimation of halo mass, or 
physical parameters like halo formation time (i.e. Zf orm in |Hearin| 
|& Wa tson 2013). Differentiating between physical effects and sys¬ 
tematic s in measuring group properties remains a challenge. 

The problems involved in using galaxy group catalogues to 
measure 1-halo conformity extend to studies which use alternate 
methods. Phill ips et al.| < |2014a|b| ) instead use an isolation crite¬ 
ria, in essence a type of conservative group finder, to select L* 
central-0. 1L* satellite spectroscopic pairs in the SDSS. With this 
sample, they study the statistical properties of the satellite popu¬ 
lations around star forming and quiescent L* galaxies. They find 
that star forming central galaxies have satellite populations that are 
more star forming than quiescent centrals at a fixed central stellar 
mass. They point out that their measurement of conformity could 
be explained if quenching efficiency has a strong dependence on 
halo mass, especially for haloes ~ 10 12 M©. Beyond halo mass es¬ 
timation difficulties, Phillips et al. (2014b) use a correction scheme 
based on purity and completeness, similar to the P and C mea¬ 
sures discussed in § |3.2[ to account for interlopers in their satel¬ 
lite sample. This correction may over or under-correct the inferred 
statistics, or be more or less applicable to the sample split into star 
forming and passive sub-samples. 

The primary challenge for conformity measurements is to iso¬ 
late samples of star forming and quiescent galaxies at fixed halo 
mass. Group catalogues provide one method to attempt this, along 
with isolation criteria, but they are subject to systematic effects of 
the type discussed in this paper. Satellite velocity distributions and 
lensing measurements around groups or ‘isolated’ galaxies may 
provide one mechanism to reduce systematic uncertainties in fu¬ 
ture measurements of conformity. 

5.2 The Role of Halo Definitions 

The group finding process is sensitive to the precise halo definition 
adopted. In a simulation, the boundary of a halo will define which 
dark matter structures are subhaloes (those within the boundary), 
and which are neighbouring haloes (those outside the boundary). If 
a group finder is optimized to recover a galaxy population residing 
in a halo of a particular definition, and the model that the results are 
being compared to defines haloes in a different manner, the results 
are difficult to interpret. On the other hand, this problem gets to 
the root of the challenge facing any group finder, namely optimiz¬ 
ing the group finder to recover physically meaningful groups while 
minimizing membership errors. 

Each group finder discussed in this paper was tuned on a dif¬ 
ferent set of mocks with a different set of criteria setting the tun¬ 
ing parameters. In particular, the YANG-GF and TINKER-GF were 
tuned on SO halo based mocks to reproduce SO halo galaxy pop¬ 
ulations but with different optimization criteria. The former min¬ 
imizes interlopers and maximizes completeness on the group to 
group level while the latter is optimized to reproduce the full HOD 
of the mock. The BERLIND-GF was tuned on FoF halo based 
mocks, with a different set of optimization goals geared towards 
reproducing the FoF halo population characteristics of higher mass 
groups than the YANG-GF and TINKER-GF. 

One result of these different halo definitions and optimization 


schemes is apparent throughout much of this paper. The YANG-GF 
and TINKER-GF groups are often in closer agreement to the mock 
values (w/ SO haloes), especially for measurements of occupation 
statistics sensitive to the over-all number of galaxies in a halo such 
as the (W ga i|Mvir) and / sa t- This is because the BERLIND-GF 
groups are geared to reproduce FoF haloes which trace different 
structures than SO haloes. Additionally, the BERLIND-GF is op¬ 
timized to reproduce relatively large groups compared to the other 
two group finders. The result is that the BERLIND-GF group cat¬ 
alogue has an elevated satellite fraction of ~ 45% relative to the 
mocks intrinsic value of ~ 25%. 

While the number of galaxies in haloes is sensitive to the halo 
definition used, the relative abundance of red and blue galaxies 
need not to be affected in the same way. The age-matching mock 
used for this study assigns galaxy colour based on a measure of 
halo age, Zf orm , and the age parameter of satellites is very rarely 
affected by their accretion into a host halo. The result is that there 
is no abrupt colour transition between satellites and galaxies out¬ 
side their halo, with weak colour gradients on the 1-halo to 2-halo 
scale. This is a manifestation of the 2-halo galactic conformity phe¬ 
nomenon. As discussed throughout the paper, it is this 2-halo con¬ 
formity phenomenon which results in smaller errors in the group 
finder inferred colour dependent occupation statistics as a result of 
interlopers than one would expect if there were no 2-halo colour 
correlations. A corollary to this is that small changes to the bound¬ 
ary of a halo should not drastically affect relative abundance of red 
and blue satellites in groups, i.e. measurements / re a- 

We do not attempt to retune any of the group finders used in 
this paper for a few reasons. First, it is not possible to retune a group 
finder on the real universe, so not retuning on our particular mock 
gives a more useful picture of the kind and magnitude of the errors 
group finders are likely to make. Second, it is interesting to note the 
effect of the different optimization goals and halo definitions on the 
outcome of the colour-dependent statistics measured here. Further¬ 
more, we use the group finders as they are because they have often 
been used in the literature with the tunings used here, and we be¬ 
lieve it is important to show the results as they are with no further 
optimization. It is an interesting question to ask which group find¬ 
ing method and set of tuning parameters is best suited to recover a 
particular occupation statistic, but such a study is beyond the scope 
of this work. 

5.3 Additional Systematics 

Our mock and group finders assume that galaxy luminosity is the 
primary galaxy property which determines the galaxy occupation 
statistics. This is a common assumption, but another galaxy prop¬ 
erty, namely stellar mass, could instead have been assumed to be 
the primary property. We have tested that the qualitative features of 
the mock remain unchanged if instead we use stellar mass as the 
primary galaxy property (see |Hearin et al.||2014a| ). Using a ‘per¬ 
fect’ group finder analysis similar to the method discussed in § |4.1[ 
we have checked that the three error modes described in this paper 
remain unchanged in a stellar mass based mock. As the primary 
goal of this paper is not to re-tune group finders as discussed pre¬ 
viously, we do not retune each one to use stellar mass as the halo 
mass indicator (see |Yang et al.|2007] for a discussion of this effect). 

Yang et al.| ( |2008| ) examine the affect of using stellar mass 
and luminosity as the primary galaxy property on inferring the 
halo mass dependence of the central galaxy quenched fraction and 
the colour split satellite fraction (see Fig. 5 and 11 therein respec¬ 
tively). The difference between the two measures can be significant 
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when the statistic is dependent on the halo mass estimation, indi¬ 
cating that this can be a significant and even dominant source of 
uncertainty. In a forward modelling approach, the trends that one 
is able to infer from a group catalogue will only be as accurate as 
the mock and the power of the observational statistic. The ability 
to distinguish between galaxy luminosity and stellar mass as the 
primary galaxy property driving halo occupation requires further 
work. 

Never-the-less, we briefly discuss the affect of using the ‘in¬ 
correct’ property on our group analysis, in the case of our mocks, 
stellar mass. That is, what is the effect of assuming stellar mass 
as the primary halo occupation indicator for an analysis of group 
finders run over our mock where luminosity was assumed to be the 
primary indicator? We examined the affect using a ‘perfect’ group 
finder analysis, assigning every galaxy a stellar mass using the re¬ 
lation between mass-to-light and colour from |Bell et al.||2003| and 
used by | Yang et al.| ( |2007] ) where 

log(M»/(/r 2 M 0 )) = - 0.406 + 1.097 - r)\ 

- 0.4(°-°M r - 5 log (h) - 4.64) (27) 


where the 0.0 indicates the k-corrected quantity to z — 0.0. We 
then use total group stellar mass to estimate the halo mass of 
groups. The primary error induced by this assumption is that redder 
galaxies are inferred to occupy more massive haloes at fixed lumi¬ 
nosity, the effect being strongest in groups with fewer members. To 
see the magnitude of this error, in Fig.[l6]we show the comparison 
between group masses calculated both ways. The systematic mass 
error between red and blue galaxies can be of the order of 0.5 dex. 

One interesting result of this concerns galactic conformity, 
where the mass bias effect is strong enough to infer a significantly 
reduced galactic conformity signal in our mock with intrinsic con¬ 
formity. No conformity signal is induced in the mock without in¬ 
trinsic conformity. This occurs because groups with red centrals are 
inferred to be more massive than their true mass, bringing their in¬ 
trinsically higher red fraction of satellites more in line with groups 
of the true inferred mass. This is contrary to the intrinsic confor¬ 
mity signal in a version of our mock built using stellar mass and 
sSFR. This result is also contrary to an empirical analysis of SDSS 
group catalogues which continue to display 1-halo galactic confor¬ 
mity when using stellar mass as the halo mass indicator (see Kauff-| 


|mann et al.||20l3l |Knobel et al.pOlil Phillips et al.|2014a) . We 

leave an exhaustive examination of 1-halo conformity in SDSS to 
a forthcoming paper. 

It is also important to note that we have also neglected to in¬ 
clude a model for spectroscopic incompleteness in our study. The 
most significant source of incompleteness in the SDSS is due to the 
fact that two fibers used to obtain spectra cannot be placed within 
55". Approximately ~ 7% of galaxies are affected in the SDSS 
main sample, disproportionately affecting galaxies in regions of 
high-density. Those that redshifts are not obtained for are called 
fiber collisions. These galaxies are often included in the spectro¬ 
scopic galaxy sample used to build group catalogues by using the 
redshift of the nearest neighbour. For roughly 60% of galaxies, this 
is a good approximation ( [Zehavi et al.|2002| ). The remainder of the 
time, this assigns a redshift to a galaxy that is significantly in error. 
Including these galaxies in an analysis will generally decrease the 
purity of group memberships, and excluding them will decrease 
the completeness. Furthermore, because fiber collisions preferen¬ 
tially occur in dense regions, and galaxy colour is correlated with 
density, including or excluding these galaxies will likely result in 
some un-modelled correlated colour error. This affect is easily stud¬ 
ied empirically in an analysis by including and excluding affected 



Figure 16. inferred halo mass using total group stellar mass vs. inferred 
halo mass using total group luminosity for groups with red central galaxies 
(red points with error bars indicating the scatter) and blue central galaxies 
(blue points with error bars indicating the scatter). When galaxy luminosity 
is assumed to be the primary galaxy property determining halo occupation 
to build a mock, and group halo mass is estimated with total stellar mass, 
groups with red galaxies are systematically assigned more massive halo 
mass estimates. Extreme outliers are a result of galaxies with extremely 
red colours and the overly simple mass-to-light ratio conversion used to 
calculate stellar mass. 


galaxies and examining the result on a statistic. We do not address 
this source of error in our analysis, and refer the reader to [Yang| 
|et al.| ( [2007]) for a more lengthy discussion on the affect of fiber 
collisions on the Yang et al. group finder, where indications are that 
the effects are small. 


5.4 Improving Galaxy Group Catalogues 

There have been several attempts to “correct” galaxy group cat¬ 
alogue statistics using the purity and completeness measures dis¬ 
cussed in § |3.2| and appendix [B] |Tinker et al.| < [2011|) empirically 
measured the purity and completeness of satellites in groups as 
a function of large scale environment. They then use this aver¬ 
age measurement to correct their results assuming interlopers are 
taken at random from the central population. As previously noted, 
Phillips et al.| ( |2014a|bj ) preform a very similar operation on their 
measurements of central-satellite pairs. 

In principle, the HTP (§ |3.3| ) contains all the information nec¬ 
essary to correct 1-point statistics. The feasibility and accuracy of 
such a scheme is dependent on the stability of the HTP to the under¬ 
lying models which one seeks to differentiate between. We leave 
such a study to future work, but briefly describe what would be 
required. First, a large number of mocks with consistent cluster¬ 
ing statistics, but varying models for assigning the relevant galaxy 
property, e.g. colour, would need to be created. A group finder 
would then need to be run over each mock, calculating the HTP for 
each. The variation in the ensemble of HTPs could then be quanti¬ 
fied, and if the variation is low, then, in principle, the true underly- 
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ing colour-dependent statistics can be measured utilizing the stable 
HTP. 

Beyond utilization of the HTP, group finders may be improved 
in other ways. We use a simple prescription to distinguish a central 
galaxy from satellites in a group, namely we assign the most lumi¬ 
nous galaxy in the group to be the central galaxy. We expect this 
to fail some fraction of the time, as demonstrated in § |3.1.2| even 
if group finders assign galaxy membership perfectly, and we see 
evidence for this failure in the HTPs. It is possible that by using ge¬ 
ometric and velocity information from high multiplicity groups that 
central assignment accuracy could be increased. Improving this for 
lower mass groups with fewer members remains difficult. We did 
not investigate this issue here and leave it to the consideration of 
future work. 

The group finding process could be improved and tuned 
for specific tasks. Duarte & Mamon (2014a) analysed the fail¬ 
ure modes of FoF based grouping algorithms and quantified the 
regimes where different linking lengths are most successful in re¬ 
covering group properties. This suggests an adaptive FoF linking 
algorithm may be best suited for recovering groups over a wide 
range of environments and masses, an approach which begins to 
look more similar to the YANG-GF and TINKER-GF methods. 

Ideally, group finders should be used to build groups in a prob¬ 
abilistic sense to capture the inherent uncertainties in the process 
(e.g. i n the vain of|Botzler et al.|2004[|Li & Yee|2008] |Jian et ah] 
|2014[ |Duarte & Mamon||2014b| ). Given some model, it is possi¬ 

ble to assign probabilities that individual galaxies reside in a com¬ 
mon group-currently this information is explicitly ignored in ( [Yang| 
|et al. 2005 20 07)>. During th e final stages of preparation of this 
manuscript Duarte & Mamon (2014b]) have explicitly retained and 
used probabilities of group membership in the formulation of a new 
group finder, MAGGIE. While it would be very interesting to inves¬ 
tigate whether MAGGIE, or another probabilistic approach to the 
the grouping problem, alleviates some of the issues investigated 
here, we leave this to future work. 

Finally, statistics should not be directly inferred from galaxy 
group catalogues without careful thought as to how the errors 
explored in this paper could affect the measurements. The most 
straightforward solution is to forward model the group finding pro¬ 
cess before making comparisons to observational data (see [Wetzel] 
|et al.|2013| for an example). This can sometimes require the cre¬ 
ation of novel mock galaxy catalogues, but this approach can be 
immediately fruitful without modification to current group finding 
algorithms. 


6 SUMMARY 

We described three types of errors group finding is subject to. These 
are errors in: 

(i) halo mass estimation 

(ii) central/satellite designation 

(iii) group membership determination 

and in practice, errors in each of these categories occur and are 
coupled. When galaxies that reside in a common halo are misiden- 
tified to be in two or more groups, a process called fracturing, group 
finders tend to misidentify satellites as centrals and underestimate 
group mass. When galaxies that reside in two or more haloes are 
misidentified to be in a common group, a process called fusing, 
group finders tend to misidentify centrals as satellites and overesti¬ 


mate group mass. For a given group finder (and mock), this corre¬ 
lated error is characterized by the halo transition probability (HTP). 

Errors in group finding will affect inferred colour-dependent 
occupation statistics measured directly with a galaxy group cat¬ 
alogue. To characterize the specific errors made, we used three 
group finders to measure colour-dependent occupation statistics on 
a mock galaxy survey created using the age-matching technique. 
For the set of occupation statistics explored in this study, group 
finder inferred measurements: 

(i) tend to equalize the properties of distinct galaxy sub¬ 
populations 

(ii) are able to recover 1-halo conformity (but also induce a 
small 1-halo conformity signal) 

(iii) errors are reduced by the presence of realistic 2-halo con¬ 
formity in our mock. 

We examined the ability to directly infer colour depen¬ 
dent occupation statistics (N ga \\M), / sa t(Lgai), 0(L ga i|M)dL, 
/red (A/haio) > and 1-halo conformity from galaxy group catalogues. 
The first moment of the HOD is largely recovered for the full sam¬ 
ple of galaxies, only varying significantly from the true relation for 
the sub-dominant blue galaxy sub-population. The satellite fraction 
as a function of galaxy luminosity is sensitive to the halo defini¬ 
tion applied when tuning a group finder, but when this definition 
is close to that of the underlying mock, group finders recover this 
statistics very well. The CLF is also recovered accurately for the 
full galaxy sample, only showing some systematic deviation for the 
colour split sample where the sub-dominant blue population of cen¬ 
trals and satellites is over estimated. The red fraction of galaxies as 
a function of halo mass is poorly recovered by group finders, partic¬ 
ularly the central red fraction, an important result that may have im¬ 
plications for the inference halo-galaxy quenching relations. Group 
finders recover a 1-halo conformity signal of the same magnitude 
as the underlying signal when one exists, but display a tendency to 
induce a signal in a mock constructed to have no conformity. 

To conclude, colour-dependent occupation statistics inferred 
from galaxy group catalogues are affected by group finding errors 
in non-trivial ways. Measurements of galaxy (-group) statistics re¬ 
quires fully modelling the group finding process and its errors us¬ 
ing the HTP, or by forward modelling the group finding process to 
make a fair comparison to data. 
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APPENDIX A: SHUFFLED MOCK 

The age-matching based mock has galactic conformity build into 
it. As such, it is ideally suited to test whether galaxy group finders 
can recover this conformity signal, both qualitatively and quanti¬ 
tatively. However, we also would like to perform a null-test, based 
on a mock catalogue without galactic conformity. To do so, we con¬ 
struct a second mock derived from the age-matching mock by shuf¬ 
fling galaxy populations among haloes of of similar mass. 

In detail, the shuffling consists of the following steps. First we 
look up all haloes from the ROCKS TAR Bolshoi z — 0 halo cata¬ 
logue which did not receive a galaxy when applying the abundance 
matching technique described above and add these to our mock cat¬ 
alogue with null values for all galaxy properties. Next we bin haloes 
in small, 0.1 dex, bins in halo mass. We then shuffle entire galaxy 
groups, i.e. galaxies which occupy the same host halo, amongst 
haloes in a bin (allowing only one group per halo), preserving the 
relative positions between a central galaxy and its satellite(s)-note 
that this destroys the galaxy-sub-halo connection. This shuffling re¬ 
moves any 2-halo correlation between galaxy groups (e.g., 2-halo 
conformity) present in the age-matching mock. Next we remove 
all empty haloes from the new catalogue and apply one additional 
shuffling. Haloes are again binned in small, 0.1 dex bins in halo 
mass, and this time only satellite galaxies are shuffled among the 
haloes in a bin, whereby we preserve for each satellite galaxy its 
relative distance from the halo centre such that x sa t — ien is con¬ 
stant throughout the shuffling process. This shuffling destroys any 
correlation between the properties of centrals and satellites (e.g., 1- 
halo conformity). In the text we refer to this mock as the “shuffled 
mock” and the unaltered version as the “age-matching mock”. 

The age-matching mock and shuffled mock have the exact 
same HOD, P(A r ga i|M v ir), by design. However, the clustering 
statistics of the galaxies change. The auto 2-point correlation of 


galaxies in the age-matching mock is slightly higher than in the 
shuffled mock. This is because we shuffle galaxies at fixed halo 
mass, and not fixed Vp ea k, the value used to populate galaxies in 
haloes when building the age-matching mock (see |Zentner, Hearing 
|& van den Bosch|2014[ ). 


APPENDIX B: GROUP FINDING ERRORS: INDIVIDUAL 
HALOES AND GROUPS 

Throughout this section, we define the subscript ‘g’ to mean “group 
member” and ‘ng’ to mean “not a group member”. Similarly, we 
define the subscript ‘h’ to mean “halo member” and ‘nh’ to mean 
“not a halo member”. If one desires to compare the membership 
of a group to a halo, it is necessary to define a unique connec¬ 
tion between every group and halo. This is a non-trivial choice, and 
the decision may strongly influence the resulting group/halo level 
statistics of interest. We do not make a specific choice for this dis¬ 
cussion, but we will briefly discuss possible choices used in the 
literature. 

One may define the halo associated with a group as the halo 
associated with a true central group member (e.g. | Yang et al.|2007j 
|Munoz-Cuartas & Muller |20l2] >, where if there are more than one, 
the true central of the most massive halo (or as identified as a central 
in the group). This will result in any group that does not contain a 
true central, to not have an associated halo (fig. |B1| result b,e,f). 
Additionally, any group that contains more than one true central 
will cause one or more haloes to have no associated group (fig. 
|B 1 1 result d,f). Variants on this method are possible. If more than 
one halo’s central is in a group, |Munoz-Cuartas & Muller| < |2012) 
connect the group to the halo with the most members in the group. 

Another possible choice is to define the group associated with 
a halo as the group associated with the largest number of halo mem¬ 
bers, or majority (e.g. |Eke et al.|2004[|Robotham et al.|2011| ). This 
definition may result in one or more groups being associated with 
no halo. For example, any halo that has undergone a pure fractur¬ 
ing process during group finding, will produce two or more groups, 
where, for each group, all members belong to a common halo (fig. 
|B1 1 result b,c). This may also happen for a combination of fusing 
and fracturing of haloes. Additionally, a contingency should be de¬ 
fined for when two or more haloes have equal numbers of associ¬ 
ated group members with no halo having more. 

Other options are of course possible. |Merchan & Zandivarez| 
( |2002| associate groups to haloes by matching the centre of mass 
between groups in redshift space and haloes in real space, while 
|Calvi et ah| ( |2011| i match two group catalogues as a cross-check in 
a similar way. 

Once a halo-group connection has been made, we can define 
the total number of group members as: 

N g = N g |h + Ag| nh (Bl) 

and the total number of halo members as: 


Nh — iV g |h + Ang|h. (B2) 


We define the completeness of an individual group’s membership 
as: 


Cmem — 

and the purity as: 

Jmem — 
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Figure Bl. diagram of the possible outcomes of placing galaxies which 
live in haloes(solid circles) into groups(dashed circles). One goal of a group 
finder is to assign all galaxies which occupy a common halo, and only those 
galaxies, to a unique group(result a). Two processes result in errors in the 
group finding process: fracturing-galaxies which live in a common halo as¬ 
signed to multiple groups(result b,c), and fusing-galaxies from two distinct 
haloes assigned to a common group(result d). These two processes may also 
occur simultaneously(result e,f). 


One more useful quantity is the transition factor: 

rj-i _ ^g|h + ^g|nh _ Cmem m Z\ 

1 mem — — • K^D) 

f’g|h ~T ^ng|h -tmem 

Note that (T mem — 1) => (Cmem — Pmem). In the limit where 
(T mem ) ( Mh ) — 1 , the HOD may be recovered. However, T mem 
1 only implies purity equal completeness, it does not require the 
purity and completeness be near 1. So, in general, it is desirable to 
tune a group finder such that [Cmem 1,-Pmem l,T mem 1]. 
The ideal balance of these three conditions will depend on the goal 
of the study for which the group finder is being tuned. 

We choose not to use the approach outlined above as the 
choice of how to define a halo-group connection is arbitrary, and 
we expect that the choice will significantly affect the results of any 
purity and completeness analysis. 


APPENDIX C: DIFFICULTIES IN DEFINING GROUPS IN 
SIMULATIONS 

There are two phenomena present in sub-halo abundance match¬ 
ing (SHAM) mocks which one should be aware of when compar¬ 
ing the results of a group finder to a SHAM mock. First, spherical 
over-density based halo finders run on N-body simulations to build 
the mock will sometimes disjoin a halo while it is being accreted 
into a new host halo (or more generally, while passing through the 
virial volume of a larger halo). When this occurs, some particles of 
the now disjoint halo remain outside the virial volume of the new 
host halo. This can have the effect of separating central and satel¬ 
lite populations that are part of one halo (or have been in the past), 
but which have been separated at a particular time-step in a halo 
catalogue. A particularly pernicious outcome can be a group in the 
mock with no true central galaxy, because the central galaxy has 
been accreted into a new host halo, while one or more of its satel¬ 
lites have not (see top panel of Fig. ED- We refer to these satellites 
as abandoned satellites. When a group exists that consists entirely 
of abandoned satellites, it violates the assumption that every group 
must have a central galaxy. This creates a problem when comparing 
the result of a group finder to the mock, as the mock has already vi¬ 
olated an ansatz of the group finder, namely that every group must 
have a central galaxy. 

There is a second effect to be aware of when comparing a 
mock to group finder results which occurs when defining a volume 
limited complete galaxy sample. It is possible to create groups with 
a satellite and no central when a central galaxy of a group does not 
make the cut, while a satellite does. We refer to this phenomenon as 
a “ghost” central. This results in a group with no central, but only in 
the sense that no central made the sample cut. Never-the-less, such 
a group also violates the assumption that every group must have a 
central galaxy. 

We do not account for either of these effects when calculating 
group statistics or when shuffling the mock to create our shuffled 
mock catalogue (described in[A]). These two effects together affect 
~ 1% of all satellites in the age-matching mock, with there being 
approximately equal numbers of abandoned satellites and satellites 
with a ghost central. 

This paper has been typeset from a TgX/ DTgX file prepared by the 
author. 
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Figure Cl. diagram of the possible pernicious effects when placing galax¬ 
ies which live in haloes(solid circles) into groups(dashed circles), top: We 
show an abandoned satellite-If two haloes overlap, such that the central 
galaxy of the less massive halo falls within the virial volume of the larger 
halo, while one of its satellites does not, the satellite will occupy a group 
with no central, as its previous central is now a satellite of the larger halo, 
bottom: We show a group with a ghost central-Because there is scatter in 
the Vpeak “ .Lgai relation, a halo’s central galaxy may have a luminosity 
below the magnitude limit of the sample, while the satellite has a luminosity 
above the limit, resulting in a group with no central. 








